Autocomplete for Huge Arrays - arrays

I am building a web app where the user is able to search for a location and the possible spots are drawn from a database that numbers around 10,000. I would like to use the jQuery UI autocomplete plugin for this and am wondering if it is realistic to load 10,000 sites into an array that it searches from. If not what can I do to make it work and speed it up.
Thanks!

you probably don't want to send 10,000 locations to each browser. check out: http://jqueryui.com/demos/autocomplete/#remote
jquery will send the partial string to the server once it passes 2 characters (in that example). you then send back 10 or so matches. as the user types in more characters, the matches are more refined until the user sees the one they want.
i've done this with substring matches as well, although the fast and typical way to do this is by matching the start of the string.
on the server side, you probably want to cache the matches somehow.

Related

What is a good way to store text that contains checkboxes in a database?

I'm currently working on a notes app because I really dislike any notes app that I could find on the app store because the lists you can create are either:
only text
only checkboxes
lines of text followed by checkboxes or vice versa
But by now I found no app (except for Samsung's proprietary Samsung Notes app which only works on Samsung smartphones as it seems) that allows alternating use of lines text and checkboxes.
When thinking about how to implement this properly I had some issues coming up with a proper way of identifying when there is a checkbox and when there should be a line of text. For obvious reasons, just storing text and specifying an identifier like <chkbox> (or something along those lines) is not a good solution because checkboxes should only be placeable by clicking the respective icon. The user should neither be restricted to use the term <chkbox> (because for whatever reason he might want to use it) nor should injections like this be possible.
The best thing that came to my mind up until now was to store each line individually (identified by \n) in a list with dynamic datatype and for each line either store it as text object or string or as checkbox item. Finally, I could simply store the list as one single object in the database. However, I'm unsure whether that's good practice as I could imagine that it makes changes to the list quite cumbersome (e.g. for changing the ticked-value of one single checkbox we would need to store the entire datastructure again instead of just changing one boolean in the database)?
I'm working in Flutter and use Firebase Firestore as my backend/database. Although I'm mostly looking for a general approach/solution to this, with the chosen technologies in mind, is there any solution that would work better than what my recent thoughts are, or is there any serious flaws or drawbacks that I overlooked? Thanks in advance for any constructive input.

Custom Searcher - Blending of hits from different sources

We have a need for "Blending of hits from different sources", as per your documentation it is recommended to write a custom-searcher in JAVA. Is there a demo of this written somewhere on Github ? I wouldn't even know where to start :( I understand I can create search "chains" , preferably Asynchronous, and then blend results in JAVA before returning them...but then how would I handle paginations, limits...etc ? This all seems very complicated, for someone who doesn't even know JAVA that much. So, I am hoping someone has already written a demo for this ? Please ? Anyone ?
Thank you so much
EDIT to make my quesion clearer:
We are writing a search engine that fetches data from various websites. Some websites have 10mil indexable items, other websites only 100,000. When we present the results to end user, we want to include results from all our sources ( when match applies ). Let's say 10 results from each of the websites we crawl, so that they all get equal amount of attention on page. If we don't do custom blending, what happens is that the largest website with most items wins all our traffic.
I understand that we can send 10 separate queries to VESPA, and blend the results in our front end, but that seems very inefficient. Thus, the quesion of "Custome Searcher". Thank you so much !
That documentation covers some very advanced use cases which you do not have. Are your sources different Vespa schemas or content clusters? If so Vespa will by default blend the hits returned from each according to their relevance scores so there's nothing you need to do.
The two other most common use-cases are:
Some (or all) the data sources are external, so you need to write a Searcher component to fetch the external data and turn it into a Result.
You want the data to be blended in some custom way (rather than by relevance score). If so you need to exclude the default blending Searcher (com.yahoo.prelude.searcher.BlendingSearcher) and write your own.
If you provide some more information about your use cases I can give you some code examples.
EDIT: Use grouping to solve the need explained under "EDIT" in the question:
Create a "siteid" field when feeding (e.g in document processing).
Use the grouping expression all(group(siteid) each(max(10) output(summary())))
See http://docs.vespa.ai/documentation/grouping.html

How to make JSON loads faster with large data (on HTTP or WebPage)

. Requesting the page(on HTTP or WebPage), it is very slow or even crash unless i load my JSON with fewer data. I really need to solve this since sooner or later i will be using large amount of data frequently. Here are my JSON data. --->>>
Notes:
1. The JSON loads only String and Integer.
2. I used to view my JSON in JSONView more like treeview using plugin
from GoogleChrome.
I am using angular and nodejs. tq
A quick resume of all the things that comes to my mind :
I had a similar issue once. My solutions may make the UI change.
Pagination
I doubt you can display that much data at one time, so the strategy should be divising your data in small amounts and then only load more when the client ask for it.
This way, the whole data is no longer stored in RAM as it is currently. This is how forums works (only 20 topics at a time).
Just imagine if StackOverflow make you load the whole historic of questions in the main page, how much GB would your navigator need just for that ?
You can use pagination in a classic way (button with page number, like google), or in an infinite scroll way, as you want.
For that you need to adapt your api and keep track of the index of the pages you already loaded at every moment in your Front. There are plenty of examples in AngularJS.
Only show the beginning of the data
When you look at Facebook comments, you may have a "show more" button. In their case, maybe it's to not break the UI, but it can also be used to not overload data.
You can display only the main lines of your datae (titles or somewhat) and add a button so the user can load more details if they want.
In your data model, the cost seems to be on the second level of "C". Just load data untill this second level, and download the remaining part (for this object) only if the user asks for.
Once again, no need to overload, your client's RAM will be thankfull, and your client's mobile 3G too.
Optimize your data stucture
If this is still not enough :
As StefanArya said in comment, indeed remove the "I" attribute, which is redundant with the JSON key.
Remove the "I" as you can use Object.keys() to get key name.
You also may don't need that much precision on your floats.
If I see any other ideas, I'll edit this post later.

Showing 1 million rows in a browser

Our Utilty has one single table, and it has 10 million to 50 million rows, There may be a case we need to show 50 million rows in a single page html client page, To show the rows in browser we use jQuery in UI.
To retrieve rows we use Hibernate and use Spring for MVC. I am looking for best practice in retrieving the rows and showing in UI. Should I retrieve a bulk of thousands rows or two thousand rows in Hibernate and buffer to Web Client or a best practice is there ?
The best practice is not to do this. It will explode the browser memory and rendering engine, and will take too much time to load.
Add a search form to your webapp, make the end user search for what he's interested about, and only display the N first search results, just like Google does.
Nobody is able to do anything meaningful with 50 million rows without searching anyway.
i think you should use scroll pagination (when user reaches to almost bottom of page makes ajax call and load data).
Just for example quick google example & demo
and if your data is tabular then you can use jQGrid
Handling a larger quantity of data in an application must be done via virtualization. While it's true that the user will be overwhelmed by millions of records, it's not exactly true that they can't do stuff with it, nor that such quantities of data are unfathomable.
In practice and depending on what you're doing you'll note that this limit crops up on you with just thousands of records. Which frankly is very little data. Data centric apps just need a different approach, altogether, if they are going to work in a browser and perform well.
The way we do this is quite simple but not all that straightforward.
It helps to decide on a fixed height, because you will need to know the max height of a scrollable container. You then render into this container the subset of records that can be visible at any given moment and position them accordingly (based on scroll events). There are more or less efficient ways of doing this.
The end goal remains the same. You basically need to cull everything which isn't directly visible on screen in such a way that the browser isn't paying the cost of memory and layout logic for the app to be responsive. This is common practice in game development, only the world that is visible right now on screen is ever present at any given moment. So that's what you need to do to get large quantities of stuff to behave well.
In the context of a browser, anything which attributes to memory use and layout/render cost needs to go away if it's isn't absolutely vital.
You can also stagger and smear recalculations so that you don't incur the cost of whatever is causing the app to degrade on every small update. The user can afford to wait 1 second, if the apps remains responsive.

How to implement a lossless URL shortening

First, a bit of context:
I'm trying to implement a URL shortening on my own server (in C, if that matters). The aim is to avoid long URLs while being able to restore a context from a shortened URL.
Currently I have a implementation that creates a session on the server, identified by a certain ID. This works, but consumes memory on the server (and is not desired since it's an embedded server with limited resources and the main purpose of the device isn't providing web pages but doing other cool stuff).
Another option would be to use cookies or HTML5 webstorage to store the session information in the client.
But what I'm searching for is the possibility to store the shortened URL parameters in one parameter that I attach to the URL and be able to re-construct the original parameters from that one.
First thought was to use a Base64-encoding to put all the parameters into one, but this produces an even larger URL.
Currently, I'm thinking of compressing the URL parameters (using some compression algorithm like zip, bz2, ...), do the Base64-encoding on that compressed binary blob and use that information as context. When I get the parameter, I could do a Base64-decoding, de-compress the result and have hands on the original URL.
The question is: is there any other possibility that I'm overlooking that I could use to lossless compress a large list of URL parameters into a single smaller one?
Update:
After the comments from home, I realized that I overlooked that compressing itself adds some overhead to the compressed data making the compressed data even larger than the original data because of the overhead that for example zipping adds to the content.
So (as home states in his comments), I'm starting to think that compressing the whole list of URL parameters is only really useful if the parameters are beyond a certain length because otherwise, I could end up having an even larger URL than before.
You can always roll your own compression. If you simply apply some huffman coding, the result will always be smaller (but then base64 encoding it, it'll grow a bit, so the net effect may perhaps not be optimal).
I'm using a custom compression strategy on an embedded project I work with where I first use a lzjb (a lempel ziv derivate, follow link for source code, really tight implementation (from open solaris)) followed by huffman coding the compressed result.
The lzjb algorithm doesn't perform too well on very short inputs, though (~16 bytes, in which case I leave it uncompressed).

Resources