how stackoverflow permalink works?

how stackoverflow permalink works? - google-app-engine

how does stackoverflow manage permalinks. For example an arbitary qn like: ASP.NET + jQuery + Dynamically Created HyperLink controls In this case what happens if the same user posts another qn with the same title. I think the number before- /1002230/ is the key but on what basis is that created. Is it an indicator of how many questions there are in stackoverflow.
This doubt is relevant to me because i am trying to use the title of a blogpost in the permanent link for it. However this won't allow for multiple blogpost with the same title. Neither do i want to use id number. I am infact using appengine for this app so the generated key is something like 'ahVzYW5qaGFjaG9vbGhhLXNhbmRib3hyCwsSBUFjdG9uGFUM' which is surely not nice. So any hints on how to prettify my url
Thanks

Rather than using the Key().str(), it might be nicer to use the Key's id or key_name with Model.get_by_id() or Model.get_by_key_name(). These are both more user friendly (integer or supplied string respectively).

The number is the question id which is unique - they only increment and are not reused. The human-readable string is ignored by the server when you retrieve the page by URL - it's for convenience and pretty look - to solve the same task you have. So each question URL has two parts - the machine-readable (the starting part and the question id) and the human-readable - the filtered question title afterwards.
With the exclusion of deleted questions the number is the total number of questions on the site. Numbers of the deleted questions are not reused, so no conflict is possible.

The ID number does give some indication of how many total posts there are. I believe I recall from a podcast that both questions and answers are stored as 'posts', but comments may be posts as well.

The approach I like best is to use URLs formatted as "/1234/slug-goes-here" (like StackOverflow) or "/1234-slug-goes-here". With a little cleverness, you can ignore the content of the slug, and fetch based on the ID only, which means that links work even if they were truncated by mail software, IRC, etc.
The other approach, App Engine wise, is to use key names - make the slug the key name, which means you can look it up with MyModel.get_by_key_name(slug). This is how Bloog does it.

Related

How to save user specific arrays in Mailchimp

For quite a while I am struggling with how to save custom user specific arrays of data in Mailchimp.
Simple example: I want to save the project ids for a user in Mailchimp and in best case be able to use them there properly as well. Let's say user fritz#frey.com has the 5 project ids 12345, 25345, 21342, 23424 and 48935. Why is there no array merge field that let's me save this array of project ids to a user?! (Or is there one and I am just blind...)
I know I can do drop down fields to put users in multiple groups, like types of projects for example, but the solution can hardly be a drop down with all (several thousand) project ids and I check the ones the user is a part of (and I doubt that Mailchimp would support that solution for a large number of group items anyways).
Oh and of course I could make the field myself by abusing a string field and connect the project ids with commas or a json string but that seems neither like a clean solution nor could I use the data properly in Mailchimp (as far as I know).
I googled quite a bit and couldn't find anything helpful sadly... :(
So? Can anybody enlighten me? :)
Thanks for all your help!

It sounds like you have already arrived at the correct answer: there is no "array" type, other than the interests type, which is global and not quite the same as an array.
The best solution here sort of depends on your data. If each project ID will have many different subscribers attached to it, and there won't be too many of them active at any given time, I'd just use interests. If you think there may be dozens of project ids active simultaneously, I'd not store this data on the subscribers at all, instead I'd build static segments for each project, and add users to them.
If projects won't have a bunch of subscribers associated, I'd store the data on your end and/or continue using the comma-separated string field.

Mark a post as read on appengine

I'm currently designing an application similar to twitter/jaiku/reddit in structure. Basically there are small posts with upvotes and downvotes, and they are sorted by score and time like reddit.
I've gotten all of this working, but now our requirements have changed a bit, and we need the user to be able to mark a post as 'read'. This would make the post no longer show up in that user's feed. I can model this with a Read entity for each tuple of (User, Post), but this would require a lot of work to find posts which 'do not' exist in that table. Alternatively I can invert that relation so that I have one entity for each unread post, and it becomes much easier to find which posts 'do' exist in the table... But then I'd need to create an entry in this table for every single user everytime a post is made. This would not scale well.
My question is this: How would I model this sort of negative information in appengine's datastore? I'm using the go runtime if that matters, but answers for any runtime are fine.

This would be a many-to-many relationship. This article describes how to model different kinds of relationships, including many-to-many. The only issue is that I'm not sure weather you should store a list of read posts on the user, or a list of users who have read it, on the post, as poth lists might get large in different situations. If posts are relatively private, and not seen by many people, you could store a list of user keys on the post model. But, if one post could be seen by thousands of people, it might be better to store a list of posts on the users, as there wil probably not be many users with thousands of read posts. Another option might be to discard old posts, or just discard their read state.

Pattern matching of websites

I maintain a global repository of sites in a table.
website:
id, name, url
1 google http://www.google.com/
2 CNN http://www.cnn.com/
3 SO http://www.stackoverflow.com/
I maintain a reference table, which stores the the website id's the user has stored.
userwebsite
userid, websiteid
[attributes of the table]
Say a user is interested to save microsoft; in his collection, he enters
www.microsoft.com
As the website doesn't exist in the global repository, it first sits in the repository and then gets added to his collection. Now the contents of both the tables looks something like this:
website:
id, name, url
1 google http://www.google.com/
2 CNN http://www.cnn.com/
3 SO http://www.stackoverflow.com/
4 msft http://www.microsoft.com
userwebsite:
userid, websiteid
1 4
Say a user is interested in saving google in his collection, and he enters
www.google.com
As the website is already existing in the collection, instead of adding the website to the collection, only the reference gets added to the user collection.
The place where am stuck,
both www.google.com and http://www.google.com/
semantically they point out to the same site, but when you try to match them they are 2 distinct strings. How should I go about matching the strings in such cases?
One solution I think of is, input a site first check if the domain exists in the collection of websites (probably a PATINDEX will do good here), by doing this you get a list of sites which have the save domain name. and then check if the path exists in any of the resultant websites. Is this is a good idea?
Does a significant solution exist to this problem? Are there any better methods to go about?

You don't need pattern matching in this case, what you are really asking for (to continue from what Matteo commented about) is a way of validating web addresses and storing them in a consistent way. But if you want a regular expression to at least determine if the address is valid you can have a look here: http://www.shauninman.com/archive/2006/05/08/validating_domain_names
Or use Javascript to validate it although you don't say what language you are using outside of the SQL server.
It's almost the case you need to send the domain name to a Domain Name Server to resolve before storing it in your table. It may be better to ignore the fact they are web addresses and just think of them as strings. For example, how would you ensure peoples names were compared correctly in a database? The first step is usually to ensure upper or lower case is used; from then on it becomes more difficult such as handling middle names/initials which may be omitted.

Survey Data Model

I'm developing a simple survey module for an ASP application I'm working on and I'd like to get some suggestions on the data model.
Questions can be one of three types - multiple choice, multiple answer; multiple choice, single answer, and free response.
I'm thinking of the following tables:
Question - with a question type discriminator ifeld
PossibleAnswers- with a questionID and answer text field
SurveyQuestionResponse- with a questionID, a clientID, and answer text
Am I making this too simple?

Take a look at the
Data Model library at databaseanswers.org
Models #76 thru #81 seem pertinent, if only for "inspiration".
A lot depends on the level of sophistication of the surveys you manage, as some surveys in particular dynamic ones (aimed at removing some of the bias) require additional fields for storing properties such as the probabilities with which a particular question (or reply) is used, the many forms of a question and associated probability, and also the recording of the questions and suggested replies that were effectively offered for a give surveyee.
The model the above link:

What are some techniques for stored database keys in URL

I have read that using database keys in a URL is a bad thing to do.
For instance,
My table has 3 fields: ID:int, Title:nvarchar(5), Description:Text
I want to create a page that displays a record. Something like ...
http://server/viewitem.aspx?id=1234
First off, could someone elaborate on why this is a bad thing to do?
and secondly, what are some ways to work around using primary keys in a url?

I think it's perfectly reasonable to use primary keys in the URL.
Some considerations, however:
1) Avoid SQL injection attacks. If you just blindly accept the value of the id URL parameter and pass it into the DB, you are at risk. Make sure you sanitise the input so that it matches whatever format of key you have (e.g. strip any non-numeric characters).
2) SEO. It helps if your URL contains some context about the item (e.g. "big fluffy rabbit" rather than 1234). This helps search engines see that your page is relevant. It can also be useful for your users (I can tell from my browser history which record is which without having to remember a number).

It's not inherently a bad thing to do, but it has some caveats.
Caveat one is that someone can type in different keys and maybe pull up data you didn't want / expect them to get at. You can reduce the chance that this is successful by increasing your key space (for example making ids random 64 bit numbers).
Caveat two is that if you're running a public service and you have competitors they may be able to extract business information from your keys if they are monotonic. Example: create a post today, create a post in a week, compare Ids and you have extracted the rate at which posts are being made.
Caveat three is that it's prone to SQL injection attacks. But you'd never make those mistakes, right?

Using IDs in the URL is not necessarily bad. This site uses it, despite being done by professionals.
How can they be dangerous? When users are allowed to update or delete entries belonging to them, developers implement some sort of authentication, but they often forget to check if the entry really belongs to you. A malicious user could form a URL like "/questions/12345/delete" when he notices that "12345" belongs to you, and it would be deleted.
Programmers should ensure that a database entry with an arbitrary ID really belongs to the current logged-in user before performing such operation.
Sometimes there are strong reasons to avoid exposing IDs in the URL. In such cases, developers often generate random hashes that they store for each entry and use those in the URL. A malicious person tampering in the URL bar would have a hard time guessing a hash that would belong to some other user.

Security and privacy are the main reasons to avoid doing this. Any information that gives away your data structure is more information that a hacker can use to access your database. As mopoke says, you also expose yourself to SQL injection attacks which are fairly common and can be extremely harmful to your database and application. From a privacy standpoint, if you are displaying any information that is sensitive or personal, anybody can just substitute a number to retrieve information and if you have no mechanism for authentication, you could be putting your information at risk. Also, if it's that easy to query your database, you open yourself up to Denial of Service attacks with someone just looping through URL's against your server since they know each one will get a response.
Regardless of the nature of the data, I tend to recommend against sharing anything in the URL that could give away anything about your application's architecture, it seems to me you are just inviting trouble (I feel the same way about hidden fields which aren't really hidden).
To get around it, we usaully encrypt the parameters before passing them. In some cases, the encyrpted URL also includes some form of verification/authentication mechanism so the server can decide if it's ok to process.
Of course every application is different and the level of security you want to implement has to be balanced with functionality, budget, performance, etc. But I don't see anything wrong with being paranoid when it comes to data security.

It's a bit pedantic at times, but you want to use a unique business identifier for things rather than the surrogate key.
It can be as simple as ItemNumber instead of Id.
The Id is a db concern, not a business/user concern.

Using integer primary keys in a URL is a security risk. It is quite easy for someone to post using any number. For example, through normal web application use, the user creates a user record with an ID of 45 (viewitem/id/45). This means the user automatically knows there are 44 other users. And unless you have a correct authorization system in place they can see the other user's information by created their own url (viewitem/id/32).
2a. Use proper authorization.
2b. Use GUIDs for primary keys.

showing the key itself isn't inherently bad because it holds no real meaning, but showing the means to obtain access to an item is bad.
for instance say you had an online store that sold stuff from 2 merchants. Merchant A had items (1, 3, 5, 7) and Merchant B has items (2, 4, 5, 8).
If I am shopping on Merchant A's site and see:
http://server/viewitem.aspx?id=1
I could then try to fiddle with it and type:
http://server/viewitem.aspx?id=2
That might let me access an item that I shouldn't be accessing since I am shopping with Merchant A and not B. In general allowing users to fiddle with stuff like that can lead to security problems. Another brief example is employees that can look at their personal information (id=382) but they type in someone else id to go directly to someone else profile.
Now, having said that.. this is not bad as long as security checks are built into the system that check to make sure people are doing what they are supposed to (ex: not shopping with another merchant or not viewing another employee).
One mechanism is to store information in sessions, but some do not like that. I am not a web programmer so I will not go into that :)
The main thing is to make sure the system is secure. Never trust data that came back from the user.

Everybody seems to be posting the "problems" with using this technique, but I haven't seen any solutions. What are the alternatives. There has to be something in the URL that uniquely defines what you want to display to the user. The only other solution I can think of would be to run your entire site off forms, and have the browser post the value to the server. This is a little trickier to code, as all links need to be form submits. Also, it's only minimally harder for users of the site to put in whatever value they wish. Also this wouldn't allow the user to bookmark anything, which is a major disadvantage.
#John Virgolino mentioned encrypting the entire query string, which could help with this process. However it seems like going a little too far for most applications.

I've been reading about this, looking for a solution, but as #Kibbee says there is no real consensus.
I can think of a few possible solutions:
1) If your table uses integer keys (likely), add a check-sum digit to the identifier. That way, (simple) injection attacks will usually fail. On receiving the request, simply remove the check-sum digit and check that it still matches - if they don't then you know the URL has been tampered with. This method also hides your "rate of growth" (somewhat).
2) When storing the DB record initially, save a "secondary key" or value that you are happy to be a public id. This has to be unique and usually not sequential - examples are a UUID/Guid or a hash (MD5) of the integer ID e.g. http://server/item.aspx?id=AbD3sTGgxkjero (but be careful of characters that are not compatible with http). Nb. the secondary field will need to be indexed, and you will lose benefits of clustering that you get in 1).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight