Adding additional data fields to account information in Substrate - cryptocurrency

Very new to Substrate and Rust. My understanding of the ChainState is that it acts sort of like a database which holds account numbers (in this case public keys) and their associated balances. When making a transaction, Substrate basically checks to see that you have a sufficient balance, and if so, the transaction succeeds. (This is different from the UTXO method used in Bitcoin.)
First of all, if I am wrong on the above, please correct me.
If I am correct (or at least close) I would like to find a method for associating other data with each account. I've noticed that in the demos, accounts are also associated with names, like Alice, Bob, etc. Is this kept in the ChainState, or is this something which would only be stored on one's own node?
I am trying to determine a way to associate additional data with accounts in the ChainState. For example, how could I store a name (like Alice, Bob, etc.) in the ChainState (assuming that they are only stored locally) or even other information, such as the birthday of the account owner, or their favorite author, or whatever arbitrary information?

The Chain State is just the state of everything, not necessarily connected to Account IDs. It does, among other things, store balances and such, yes, but also many other things that the chain stored one way or another.
To add custom data, you would create a new structure (map) and then map account IDs to whatever data you want. As an example:
decl_storage! {
trait Store for Module<T: Trait> as TemplateModule {
/// The storage item for our proofs.
/// It maps a proof to the user who made the claim and when they made it.
Proofs: map hasher(blake2_128_concat) Vec<u8> => (T::AccountId, T::BlockNumber);
}
}
The above declares a storage map which will associate a hash with a tuple of Account and Block Number. That way, querying the hash will return those two values. You could also do the reverse and associate an AccountID with some other value, like a string (Vec<u8>).
I recommend going through this tutorial from which I took the above snippet: it will show you exactly how to add custom information into a chain.

The answer given by #Swader was very good, as it was general in scope. I will be looking into this answer more, as I try to associate more types of information. (I voted it up, but my vote isn't visible because I am relatively new to StackOverflow, at least on this account.)
After a bit more searching I also found this tutorial: Add a Pallet to Your Runtime.
This pallet happens to specifically add the ability to associate a nickname with the account ID, which was the example I gave in my question. #Swader's answer, however, was more general, and therefore both more useful and also more closely answered my question.
By the way, the nicknames are saved as hex encoded, and are returned as hex encoded as well. An easy way to check that the hex encoding is actually equivalent to the nickname which was set is to visit https://convertstring.com/EncodeDecode/HexDecode and paste in the hex string, without the initial 0x.

Related

Google App Engine list of ReferenceProperties vs strings

Say I have a Club entity and a User entity. A Club has a list of members.
Say now that I want to get all Clubs that a User is a part of. There are two ways to do this, each involving a list of members:
The list can be a list of strings which are the email addresses of the members. When I want to get the Clubs a User is a part of, I would do clubQuery.filter('emailAddresses =', userEmail).
The list can be a list of ReferenceProperties, where each item is a reference to the User entity. So I would do clubQuery.filter('userReferences =', user_key)
Which would be the better option to go, and why? Or are there really no differences between the two?
Depending on the average length of your emails, you might save some storage space by storing the emails, they keys may be longer. Or not.
When you do a datastore get by key, you can have a strongly consistent read, rather than the eventually consistent result you get from the filter operation. If you store the keys, you have the option of doing this when necessary.
It sounds like you have a third option, which is to use the email address as the key; you'll get the best of both worlds that way.
Edit I guess I wasn't really answering your question. Here's a better answer.
Option 1 may save you storage space and thus a bit of cost. It's small though, maybe insignificant.
Option 2 makes it easier to write code on the Club side because you get automatic dereferencing.
It doesn't make a big difference on how you query what Clubs a user is part of.
If you use my suggestion for email as a key, it could allow you fetch User entities in a consistent matter, which is a benefit you didn't ask for.

Cheap way of modeling friend relationships

I have an app engine project (java) which has a User class. I'd like to model a cheap friend relationship system. Each user can have max 50 friends. I am thinking of doing something zany like:
class User {
String username;
Text friends; // "joe,mary,frank,pete"
}
Where "friends" is a comma delimited list of usernames that the user is friends with. Here are the operations I want to support and how I'd do them with the above:
Fetch my full friend list
Just look up the User object, return back the comma delimited list of friends.
Add a friend
Fetch my user object, check if the target name exists in the string, if not, append to the end. Persist modified User object back to data store.
Delete a friend
Fetch my user object, check if the target name exists in the string, if it does, delete it from the string. Persist modified User object back to data store.
Are two users mutual friends
Fetch both user objects, check that the usernames appear on one another's user object.
Getting a full list of friends is pretty important for my application, and storing each relationship as a separate entity seems nightmarish to me (fetching each entity from the datastore when a user needs to see their friends list would probably bankrupt me). I am hoping that a simple read from the Text attribute would be much more lightweight.
Checking for the mutual friend scenario seems like the biggest drawback here, but won't happen that often. I don't know if fetching the two User objects from the datastore and doing the string comparisons would be disastrous performance-wise. Might be ok? I think I may have also read that creating and deleting objects from the data store costs more than just modifying an existing object. So the add/delete friend operations might be better this way too.
Would be happy to hear any thoughts on this or more optimal ways of going about it.
Thank you
-------------------------- Update ---------------------
As per Adrian's comment, I could also do the following:
class User {
String username;
List<String> friends;
// or //
Set<String> friends;
}
So I think if I use a List, those entities will get indexed by default. I'm not sure if I could execute a GQL query knowing that the lists are indexed to get a match without actually fetching any entities. Something like:
SELECT COUNT FROM User WHERE
(username = "me" && friends = "bob") &&
(username = "bob" && friends = "me")
Storing as a Set would help do the search faster if I loaded both User objects, but I think for both the List and Set, extra time has to be taken to deserialize them when fetched from the datastore, so not sure if their benefits are negated. Maybe it would hurt more than it'd help?
I would actually suggest you store the data in two forms. First, a list of the usernames, and secondly, a matching list of datastore keys for those users' own entities.
This will allow you to both quickly display a user's friends, and look up one particular friend to check for a mutual relationship. In particular, it will almost certainly be more efficient to check the friend's list of friend keys for the original user's keys than to match on a string.
The only drawback will be keeping the two lists in sync, but given your list of operations that doesn't sound too hard.
List<String> friends; is a good solution and one that I've seen in professional use. If friends have User IDs of your app or google you can use that data type for a list of keys instead.

Best practices for consistent and comprehensive address storage in a database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Are there any best practices (or even standards) to store addresses in a consistent and comprehensive way in a database ?
To be more specific, I believe at this stage that there are two cases for address storage :
you just need to associate an address to a person, a building or any item (the most common case). Then a flat table with text columns (address1, address2, zip, city) is probably enough. This is not the case I'm interested in.
you want to run statistics on your addresses : how many items in a specific street, or city or... Then you want to avoid misspellings of any sorts, and ensure consistency. My question is about best practices in this specific case : what are the best ways to model a consistent address database ?
A country specific design/solution would be an excellent start.
ANSWER : There does not seem to exist a perfect answer to this question yet, but :
xAL, as suggested by Hank, is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database...
To start one's own design (for a specific country), Dave's link to the Universal Postal Union (UPU) site is a very good starting point.
As for France, there is a norm (non official, but de facto standard) for addresses, which bears the lovely name of AFNOR XP Z10-011 (french only), and has to be paid for. The UPU description for France is based on this norm.
I happened to find the equivalent norm for Sweden : SS 613401.
At European level, some effort has been made, resulting in the norm EN 14142-1. It is obtainable via CEN national members.
I've been thinking about this myself as well. Here are my loose thoughts so far, and I'm wondering what other people think.
xAL (and its sister that includes personal names, XNAL) is used by both Google and Yahoo's geocoding services, giving it some weight. But since the same address can be described in xAL in many different ways--some more specific than others--then I don't see how xAL itself is an acceptable format for data storage. Some of its field names could be used, however, but in reality the only basic format that can be used among the 16 countries that my company ships to is the following:
enum address-fields
{
name,
company-name,
street-lines[], // up to 4 free-type street lines
county/sublocality,
city/town/district,
state/province/region/territory,
postal-code,
country
}
That's easy enough to map into a single database table, just allowing for NULLs on most of the columns. And it seems that this is how Amazon and a lot of organizations actually store address data. So the question that remains is how should I model this in an object model that is easily used by programmers and by any GUI code. Do we have a base Address type with subclasses for each type of address, such as AmericanAddress, CanadianAddress, GermanAddress, and so forth? Each of these address types would know how to format themselves and optionally would know a little bit about the validation of the fields.
They could also return some type of metadata about each of the fields, such as the following pseudocode data structure:
structure address-field-metadata
{
field-number, // corresponds to the enumeration above
field-index, // the order in which the field is usually displayed
field-name, // a "localized" name; US == "State", CA == "Province", etc
is-applicable, // whether or not the field is even looked at / valid
is-required, // whether or not the field is required
validation-regex, // an optional regex to apply against the field
allowed-values[] // an optional array of specific values the field can be set to
}
In fact, instead of having individual address objects for each country, we could take the slightly less object-oriented approach of having an Address object that eschews .NET properties and uses an AddressStrategy to determine formatting and validation rules:
object address
{
set-field(field-number, field-value),
address-strategy
}
object address-strategy
{
validate-field(field-number, field-value),
cleanse-address(address),
format-address(address, formatting-options)
}
When setting a field, that Address object would invoke the appropriate method on its internal AddressStrategy object.
The reason for using a SetField() method approach rather than properties with getters and setters is so that it is easier for code to actually set these fields in a generic way without resorting to reflection or switch statements.
You can imagine the process going something like this:
GUI code calls a factory method or some such to create an address based on a country. (The country dropdown, then, is the first thing that the customer selects, or has a good guess pre-selected for them based on culture info or IP address.)
GUI calls address.GetMetadata() or a similar method and receives a list of the AddressFieldMetadata structures as described above. It can use this metadata to determine what fields to display (ignoring those with is-applicable set to false), what to label those fields (using the field-name member), display those fields in a particular order, and perform cursory, presentation-level validation on that data (using the is-required, validation-regex, and allowed-values members).
GUI calls the address.SetField() method using the field-number (which corresponds to the enumeration above) and its given values. The Address object or its strategy can then perform some advanced address validation on those fields, invoke address cleaners, etc.
There could be slight variations on the above if we want to make the Address object itself behave like an immutable object once it is created. (Which I will probably try to do, since the Address object is really more like a data structure, and probably will never have any true behavior associated with itself.)
Does any of this make sense? Am I straying too far off of the OOP path? To me, this represents a pretty sensible compromise between being so abstract that implementation is nigh-impossible (xAL) versus being strictly US-biased.
Update 2 years later: I eventually ended up with a system similar to this and wrote about it at my defunct blog.
I feel like this solution is the right balance between legacy data and relational data storage, at least for the e-commerce world.
I'd use an Address table, as you've suggested, and I'd base it on the data tracked by xAL.
In the UK there is a product called PAF from Royal Mail
This gives you a unique key per address - there are hoops to jump through, though.
I basically see 2 choices if you want consistency:
Data cleansing
Basic data table look ups
Ad 1. I work with the SAS System, and SAS Institute offers a tool for data cleansing - this basically performs some checks and validations on your data, and suggests that "Abram Lincoln Road" and "Abraham Lincoln Road" be merged into the same street. I also think it draws on national data bases containing city-postal code matches and so on.
Ad 2. You build up a multiple choice list (ie basic data), and people adding new entries pick from existing entries in your basic data. In your fact table, you store keys to street names instead of the street names themselves. If you detect a spelling error, you just correct it in your basic data, and all instances are corrected with it, through the key relation.
Note that these options don't rule out each other, you can use both approaches at the same time.
In the US, I'd suggest choosing a National Change of Address vendor and model the DB after what they return.
The authorities on how addresses are constructed are generally the postal services, so for a start I would examine the data elements used by the postal services for the major markets you operate in.
See the website of the Universal Postal Union for very specific and detailed information on international postal address formats:http://www.upu.int/post_code/en/postal_addressing_systems_member_countries.shtml
"xAl is the closest thing to a global standard that popped up. It seems to be quite an overkill though, and I am not sure many people would want to implement it in their database..."
This is not a relevant argument. Implementing addresses is not a trivial task if the system needs to be "comprehensive and consistent" (i.e. worldwide). Implementing such a standard is indeed time consuming, but to meet the specified requirement nevertheless mandatory.
normalize your database schema and you'll have the perfect structure for correct consistency. and this is why:
http://weblogs.sqlteam.com/mladenp/archive/2008/09/17/Normalization-for-databases-is-like-Dependency-Injection-for-code.aspx
I asked something quite similar earlier: Dynamic contact information data/design pattern: Is this in any way feasible?.
The short answer: Storing adderres or any kind of contact information in a database is complex. The Extendible Address Language (xAL) link above has some interesting information that is the closest to a standard/best practice that I've come accross...

What are some techniques for stored database keys in URL

I have read that using database keys in a URL is a bad thing to do.
For instance,
My table has 3 fields: ID:int, Title:nvarchar(5), Description:Text
I want to create a page that displays a record. Something like ...
http://server/viewitem.aspx?id=1234
First off, could someone elaborate on why this is a bad thing to do?
and secondly, what are some ways to work around using primary keys in a url?
I think it's perfectly reasonable to use primary keys in the URL.
Some considerations, however:
1) Avoid SQL injection attacks. If you just blindly accept the value of the id URL parameter and pass it into the DB, you are at risk. Make sure you sanitise the input so that it matches whatever format of key you have (e.g. strip any non-numeric characters).
2) SEO. It helps if your URL contains some context about the item (e.g. "big fluffy rabbit" rather than 1234). This helps search engines see that your page is relevant. It can also be useful for your users (I can tell from my browser history which record is which without having to remember a number).
It's not inherently a bad thing to do, but it has some caveats.
Caveat one is that someone can type in different keys and maybe pull up data you didn't want / expect them to get at. You can reduce the chance that this is successful by increasing your key space (for example making ids random 64 bit numbers).
Caveat two is that if you're running a public service and you have competitors they may be able to extract business information from your keys if they are monotonic. Example: create a post today, create a post in a week, compare Ids and you have extracted the rate at which posts are being made.
Caveat three is that it's prone to SQL injection attacks. But you'd never make those mistakes, right?
Using IDs in the URL is not necessarily bad. This site uses it, despite being done by professionals.
How can they be dangerous? When users are allowed to update or delete entries belonging to them, developers implement some sort of authentication, but they often forget to check if the entry really belongs to you. A malicious user could form a URL like "/questions/12345/delete" when he notices that "12345" belongs to you, and it would be deleted.
Programmers should ensure that a database entry with an arbitrary ID really belongs to the current logged-in user before performing such operation.
Sometimes there are strong reasons to avoid exposing IDs in the URL. In such cases, developers often generate random hashes that they store for each entry and use those in the URL. A malicious person tampering in the URL bar would have a hard time guessing a hash that would belong to some other user.
Security and privacy are the main reasons to avoid doing this. Any information that gives away your data structure is more information that a hacker can use to access your database. As mopoke says, you also expose yourself to SQL injection attacks which are fairly common and can be extremely harmful to your database and application. From a privacy standpoint, if you are displaying any information that is sensitive or personal, anybody can just substitute a number to retrieve information and if you have no mechanism for authentication, you could be putting your information at risk. Also, if it's that easy to query your database, you open yourself up to Denial of Service attacks with someone just looping through URL's against your server since they know each one will get a response.
Regardless of the nature of the data, I tend to recommend against sharing anything in the URL that could give away anything about your application's architecture, it seems to me you are just inviting trouble (I feel the same way about hidden fields which aren't really hidden).
To get around it, we usaully encrypt the parameters before passing them. In some cases, the encyrpted URL also includes some form of verification/authentication mechanism so the server can decide if it's ok to process.
Of course every application is different and the level of security you want to implement has to be balanced with functionality, budget, performance, etc. But I don't see anything wrong with being paranoid when it comes to data security.
It's a bit pedantic at times, but you want to use a unique business identifier for things rather than the surrogate key.
It can be as simple as ItemNumber instead of Id.
The Id is a db concern, not a business/user concern.
Using integer primary keys in a URL is a security risk. It is quite easy for someone to post using any number. For example, through normal web application use, the user creates a user record with an ID of 45 (viewitem/id/45). This means the user automatically knows there are 44 other users. And unless you have a correct authorization system in place they can see the other user's information by created their own url (viewitem/id/32).
2a. Use proper authorization.
2b. Use GUIDs for primary keys.
showing the key itself isn't inherently bad because it holds no real meaning, but showing the means to obtain access to an item is bad.
for instance say you had an online store that sold stuff from 2 merchants. Merchant A had items (1, 3, 5, 7) and Merchant B has items (2, 4, 5, 8).
If I am shopping on Merchant A's site and see:
http://server/viewitem.aspx?id=1
I could then try to fiddle with it and type:
http://server/viewitem.aspx?id=2
That might let me access an item that I shouldn't be accessing since I am shopping with Merchant A and not B. In general allowing users to fiddle with stuff like that can lead to security problems. Another brief example is employees that can look at their personal information (id=382) but they type in someone else id to go directly to someone else profile.
Now, having said that.. this is not bad as long as security checks are built into the system that check to make sure people are doing what they are supposed to (ex: not shopping with another merchant or not viewing another employee).
One mechanism is to store information in sessions, but some do not like that. I am not a web programmer so I will not go into that :)
The main thing is to make sure the system is secure. Never trust data that came back from the user.
Everybody seems to be posting the "problems" with using this technique, but I haven't seen any solutions. What are the alternatives. There has to be something in the URL that uniquely defines what you want to display to the user. The only other solution I can think of would be to run your entire site off forms, and have the browser post the value to the server. This is a little trickier to code, as all links need to be form submits. Also, it's only minimally harder for users of the site to put in whatever value they wish. Also this wouldn't allow the user to bookmark anything, which is a major disadvantage.
#John Virgolino mentioned encrypting the entire query string, which could help with this process. However it seems like going a little too far for most applications.
I've been reading about this, looking for a solution, but as #Kibbee says there is no real consensus.
I can think of a few possible solutions:
1) If your table uses integer keys (likely), add a check-sum digit to the identifier. That way, (simple) injection attacks will usually fail. On receiving the request, simply remove the check-sum digit and check that it still matches - if they don't then you know the URL has been tampered with. This method also hides your "rate of growth" (somewhat).
2) When storing the DB record initially, save a "secondary key" or value that you are happy to be a public id. This has to be unique and usually not sequential - examples are a UUID/Guid or a hash (MD5) of the integer ID e.g. http://server/item.aspx?id=AbD3sTGgxkjero (but be careful of characters that are not compatible with http). Nb. the secondary field will need to be indexed, and you will lose benefits of clustering that you get in 1).

What is the "best" way to store international addresses in a database?

What is the "best" way to store international addresses in a database? Answer in the form of a schema and an explanation of the reasons why you chose to normalize (or not) the way you did. Also explain why you chose the type and length of each field.
Note: You decide what fields you think are necessary.
Plain freeform text.
Validating all the world's post/zip codes is too hard; a fixed list of countries is too politically sensitive; mandatory state/region/other administrative subdivision is just plain inappropriate (all too often I'm asked which county I live in--when I don't, because Greater London is not a county at all).
More to the point, it's simply unnecessary. Your application is highly unlikely to be modelling addresses in any serious way. If you want a postal address, ask for the postal address. Most people aren't so stupid as to put in something other than a postal address, and if they do, they can kiss their newly purchased item bye-bye.
The exception to this is if you're doing something that's naturally constrained to one country anyway. In this situation, you should ask for, say, the { postcode, house number } pair, which is enough to identify a postal address. I imagine you could achieve similar things with the extended zip code in the US.
In the past I've modeled forms that needed to be international after the ups/fedex shipping address forms on their websites (I figured if they don't know how to handle an international order we are all hosed). The fields they use can be used as reference for setting up your schema.
In general, you need to understand why you want an address. Is it for shipping/mailing? Then there is really only one requirement, have the country separate. The other lines are freeform, to be filled in by the user. The reason for this is the common forwarding strategy for mail : any incoming mail for a foreign country is forwarded without looking at the other address lines. Hence, the detailed information is parsed only by the mail sorter located in the country itself. Like the receiver, they'll be familiar with national conventions.
(UPS may bunch together some small European countries, e.. all the Low Countries are probably served from Belgium - the idea still holds.)
I think adding country/city and address text will be fine. country and city should be separate for reporting. Managers always ask for these kind of reports which you do not expect and I dont prefer running a LIKE query through a large database.
Not to give Facebook undue respect. However, the overall structure of the database seems to be overlooked in many web applications launching every day. Obviously I don't think there is a perfect solution that covers all the potential variables with address structure without some hard work. That said, combined with autocomplete Facebook manages to take location input data and eliminate a majority of their redundant entries. They do this by organizing their database well enough to provide autocomplete information in a low cost, low error way to the client in real time allowing them to more or less choose the correct location from an existing list.
I think the best solution is to access a third party database which contains your desired geographic scope and use it to initially seed your user location information. This will allow you to avoid doing the groudwork of creating your own. With any luck you can reduce the load on your server by allowing your new users to receive the correct autocomplete information directly off your third party supplier. Eventually you will be able to fill most autocomplete for location information such as city, country, etc. from information contained in your own database from user input data.
You need to provide a bit more details about how you are planning to use the data. For example, fields like City, State, Country can either be text in the single table, or be codes which are linked to a separate table with a Foreign Key.
Simplest would be
Address_Line_01 (Required, Non blank)
Address_Line_02
Address_Line_03
Landmark
City (Required)
Pin (Required)
Province_District
State (Required)
Country (Required)
All the above can be Text/Unicode with appropriate field lengths.
Phone Numbers as applicable.

Resources