How to implement a lossless URL shortening - c

First, a bit of context:
I'm trying to implement a URL shortening on my own server (in C, if that matters). The aim is to avoid long URLs while being able to restore a context from a shortened URL.
Currently I have a implementation that creates a session on the server, identified by a certain ID. This works, but consumes memory on the server (and is not desired since it's an embedded server with limited resources and the main purpose of the device isn't providing web pages but doing other cool stuff).
Another option would be to use cookies or HTML5 webstorage to store the session information in the client.
But what I'm searching for is the possibility to store the shortened URL parameters in one parameter that I attach to the URL and be able to re-construct the original parameters from that one.
First thought was to use a Base64-encoding to put all the parameters into one, but this produces an even larger URL.
Currently, I'm thinking of compressing the URL parameters (using some compression algorithm like zip, bz2, ...), do the Base64-encoding on that compressed binary blob and use that information as context. When I get the parameter, I could do a Base64-decoding, de-compress the result and have hands on the original URL.
The question is: is there any other possibility that I'm overlooking that I could use to lossless compress a large list of URL parameters into a single smaller one?
Update:
After the comments from home, I realized that I overlooked that compressing itself adds some overhead to the compressed data making the compressed data even larger than the original data because of the overhead that for example zipping adds to the content.
So (as home states in his comments), I'm starting to think that compressing the whole list of URL parameters is only really useful if the parameters are beyond a certain length because otherwise, I could end up having an even larger URL than before.

You can always roll your own compression. If you simply apply some huffman coding, the result will always be smaller (but then base64 encoding it, it'll grow a bit, so the net effect may perhaps not be optimal).
I'm using a custom compression strategy on an embedded project I work with where I first use a lzjb (a lempel ziv derivate, follow link for source code, really tight implementation (from open solaris)) followed by huffman coding the compressed result.
The lzjb algorithm doesn't perform too well on very short inputs, though (~16 bytes, in which case I leave it uncompressed).

Related

Any examples of using a Wandsearcher in vespa ? (After a weighted set query)

Currently i am using the REST interface to query vespa, which seems to work great but something tells me that i should be using searchers in the application to make the client(server side code) a bit lighter (bundle the jar file in the application package) to make it a bit smoother. I have managed to do some simple searcher/processor applications. But this is a bit overwhelming.
So are there any readily available examples ?
Basicially i want to:
Send to /search?query=someId
Do a ordinary search for the weighted set on this documentID (I guess this one can be handy: https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html)
Take those items in the response and add it to a wand item(s) and query for a wand with wandsearcher on a given field. Similar to the yql:
"select * from sources * where wand(interest, some weightedsets));","ranking":"combined_score" and return the matches.
Just curious also, apart from the trouble of string building with the http request i am doing at the moment are there any performance gains of using a searcher or go the java route vs rest?
thanks for any insight or code help i can start with.
There is an example of using the WandItem (YQL wand)here https://docs.vespa.ai/documentation/advanced-ranking.html and see also https://docs.vespa.ai/documentation/using-wand-with-vespa.html as there are two wand implementations available in Vespa, it sounds from the description that the wand() is what you want to use for this use case. For the first call you probably want to have a dedicated document summary to reduce the amount of data fetched for your first query and also the option of serving it out of memory only (See https://docs.vespa.ai/documentation/document-summaries.html)
Also see https://docs.vespa.ai/documentation/searcher-development.html as a general resource on writing searchers.
For your use case it makes a lot of sense to write a searcher to perform these two queries as your second query depends on the first and you avoid the cost of rendering/http/yql parsing which might matter if your client is remote with high network latency.

libxml2 writer differences

The bulk of the examples I can find for libxml2 are all about loading/parsing XML files. But I'm only interested in writing them; the code will never have to parse any files. There is an example using different writers, where it shows how to use the file, memory, DOM and tree models.
Looking through the code, I don't see any significant differences between them when it comes to writing. How does one decide which is better to use? (In other words, in what cases is one better than the others?)
The differences between the 4 functions you specify are minimal, it's all about where the contents go. As Alex mentioned, if memory is a concern, using xmlNewTextWriterFilename has the advantage of not needing to hold the result in memory.
The xmlWriter API, to which all the methods you mentioned belong, is one of the APIs offered. The other of note is the tree API. xmlWriter is more like calling write() to print to a file, and the tree is more like building nested structs in memory.
The tree-based versions can be good if your data is constructed in a non-linear fasion, going back and adding/changing things based on later information, etc. This would require some workarounds/caching with the streaming xmlWriter interface, as you can't change things once they've been output. The in-memory tree, however, can be fully tweaked until the instant it's serialized.
The tree API has the downside of the fact it has to keep the entire thing im memory; the rule of thumb is the memory requirements for a parsed tree is rougly 4x the size of serialized xml file.
My decision is usually dependent on whether I expect to create large documents. If not, I use the if the tree api, as the flexibility will be there if I want it. If I know efficiency will be a concern or I'll be working with large stuff, the streaming xmlWriter is the way to go.
tree API examples can be found here: http://xmlsoft.org/examples/index.html#Tree
If you're on a device with limited memory, you probably don't want to use DOM or memory-based approaches. In that case, you probably want to write out the file as you iterate through the data structure you want to write to XML.

Should "http://" be stored with a database record of a URL?

There are a number of fields a user can fill in where they'd enter a URL (their personal website, business site, favorite sites, etc etc).
It's the only thing they'd be entering in that particular field.
So should I always strip out "http://" to keep it consistent and to also reduce the possibility of broken links (ie. "http//")?
Just not sure what the best way to store URLs is.
If there's a reason to sanitize your users' input (security, size, speed, accuracy...) then do it.
But otherwise, don't.
There's actually a benefit a lot of times in taking your customer-input data as-is. They own their own typos or misspellings, broken links, etc. that way. As long as it doesn't cause a problem for you (i.e. you don't have a reason to sanitize it).
BTW -- consistency is a moot point, as it won't change the data type, and you can easily check for the "http://" and add or remove it as necessary in your presentation layers with a re-usable function.
As far as I know you actually can not call it an "URL", without having the protocol part:
http://www.w3.org/Addressing/URL/url-spec.txt
I wouldn't remove it.
However if you really need to keep the data consistent, it really depends how the URL is actually typed in your application. If it's a browser-like application, I'd bet it can be assumed to be http:// in front if there is none, for valid links.

look up input field webbrowser C language

This is in C Language
I want to know how i can write a program to lookup all the input fields of a website. Any website. and then can fill them in. I can write the simple webbrowser in vbs but how can i analyse the input fields. even better would be is i could click the lookup field and it puts the name of it in a box..... that would be ideal.
Anyone can help? thanks :)
Are you sure you want to do this in C?
I ask because it is not easy. First of all, you need to be able to run the HTTP GET request against the webpage you wish to view. For this, you probably need libcurl; you definitely don't want to be writing from scratch at any rate.
Next, you need to process the html you get, finding all input fields. You do NOT want to do this using regular expressions, if anything for the sake of bobince's blood pressure. HTML is not a regular language is the bit you need to take away - you need an xml parser. Enter libxml. I'm sure there are other xml libraries out there, and even libraries for parsing html.
Finally, having done that (got the fields etc) you need to be able to populate them and submit the correct request as per the ACTION and METHOD parameters of the FORM.
This is of course assuming you know what the fields should be formatted with. And it also assumes nothing else is going on. If you have a javascript validated web form (I sincerely hope they're validating on the request too, but they might provide feedback via JS) you won't benefit from that (unless you're going to integrate JS, in which case you might as well write a browser).
This is not a trivial task and it is the reason there are accessibility standards for HTML, because otherwise it becomes tricky to interpret the form without human interaction.
Of course, this all assumes said html is well formed, which isn't always the case...
I might suggest another approach. BeautifulSoup is a well known Python web scraping library that works very well. Python as a language allows easier string manipulation too, which will dramatically cut down your development time. I'd suggest giving the need to use C some serious thought given the size and complexity of the task you want to undertake vs your need to get a result quickly. If you have a lot of time, by all means go for C.

Cheapest Way To Export/Import Array Contents To File - AS3/AIR

I'm working on a basic editor application. It uses an array of varying size that I want to store to disk. This will eventually be in an AIR application, but for now it's just an AS3 project in Flex.
I want to store the array in a file. The application edits the data, so it doesn't need to be human readable. I want it to be in whatever format will be quickest to store and load back into the array when I need that data again.
Any recommendations?
Edit: It strikes me that importing/exporting in such a way that it can be immediately cast as an Array() would probably be the cheapest thing rather than some sort of iterating - if that's possible. Another obvious option is getting the data as a simple comma delineated string and using the String.split() function to get an array. Though again, the question is what would be cheapest - and I'm not quite convinced that's it.
I'll also add that it needs to be in some sort of permanent file, so a shared object - while possibly the fastest, isn't really a long term solution.
I think the fastest and easiest way is to use a shared object. It stores native objects, so there is no serialization / deserialization steps involved. Just assign the value and read it back.
Performance wise, probably the fastest route as well. If you are looking for a large dataset and are sure it's an AIR app, you can use AIR's db, but that will definitely take much more work.
First, take a look at this answer.
As for saving the contents of an Array, consider JSON using the export tools provided by Adobe.

Resources