Should I specify regional NTP pool or just pool.ntp.org? - ntp

From the NTP Pool Project (https://www.ntppool.org/en/use.html):
Looking up pool.ntp.org (or 0.pool.ntp.org, 1.pool.ntp.org, etc) will
usually return IP addresses for servers in or close to your country.
For most users this will give the best results.
...
You can also use the continental zones (For example europe,
north-america, oceania or asia.pool.ntp.org), and a country zone (like
ch.pool.ntp.org in Switzerland) - for all these zones, you can again
use the 0, 1 or 2 prefixes, like 0.ch.pool.ntp.org. Note, however,
that the country zone might not exist for your country, or might
contain only one or two timeservers.
I'm reading this several times and its not obvious whether I should specify the regional pool or not. I know which country my machine is in of course, so am happy to specify the pool. But it sounds like it doesn't result in better results.
Since the pool option means the protocol will select the best server after statistical analysis, I would have expected that specifying a regional pool would give a better distribution of servers from which to choose the winner. Any insights appreciated.

The country specific will contain a few servers that are only in the country zone you pick.
The region specific will contain more servers across an entire region (individual servers may be within any country)
The general pool will return lots of servers from anywhere.
There maybe less country specific that region specific servers for example the UK may have 20 servers, but Europe may have 100 servers over all the countries that make up the Europe region.
The best thing you can do is test what works best for your setup and locations. You should run a test deployment for a while and gather some stats. If you are using ntp' you can run ntpq -pcrv which will give you all the info you need.
You can also refer to my answer here as to what all the information means and how to interpret it.

Related

Can there be a postal code area which falls on to multiple time zones?

Can there be a city/place which has some postal code 'XXXX' and it falls in to 2 or more time zones?
I'm not sure if this question is even valid or not. But because of the overwhelming information regarding locations and time zones in the Internet I got a bit confused.
(My requirement is actually to store user's country, city, postal code and time zone information in a database)
Yes. In theory, postal code boundaries and time zone boundaries are separate entities, and thus their can be oddities like one postal code with multiple time zones. It is possible.
However, it is hard to find real-world examples. Cities that straddle official boundaries of state, county or country borders tend to have postal codes split along such boundaries, and may use a single time zone by convention - even if it's not the "official" one. (Though, that doesn't mean there aren't any.)
Additionally, consider that many postal codes are not directly related to physical location, but rather are used for routing mail in a particular way. For example, several postal codes in the US are used for sending mail to members of various branches of the military - even if they happen to be stationed overseas. Therefore, not every postal code can be mapped to a time zone.
A better approach is to use the information you have to approximate a latitude and longitude. This process is called "geocoding" and there are many online services that can perform this (including Google Maps). Then, use one of the techniques listed here (including Google Maps) to obtain a time zone ID for that location.

Couchbase, two user registering with same username but different datacenters?

Let's say I have two users, Alice in North America and Bob in Europe. Both want to register a new account with the same username, at the same time, on different datacenters. The datacenters are configured to replicate between each other using eventual consistency.
How can I make sure only one of them succeeds at registering the username? Keep in mind that the connection between the datacenters might even be offline at the time (worst case, but daily occurance on spotify's cassandra setup).
EDIT:
I do realize the key uniqueness is the big problem here. The thing is that I need all usernames to be unique. Imagine using twitter if you couldn't tag a specific person, but had to tag everyone with the same username.
With any eventual consistency system, and particularly in the presence of a network partition, you essentially have two choices:
Accept collisions, and pick a winner later.
Ensure you never have a collision.
In the case of Couchbase:
For (1) that means letting two users register with the same address in both NA and EU, and then later picking one as the "winner" (when the network link is present - not a very desirable outcome for something like a user account. A slight variation on this would be something like #Robert's suggestion and putting them in a staging area (which means the account cannot be made "active" until the partition is resolved), and then telling the "winning" user they have successfully registered, and the "loser" that the name is taken and to try again.
For (2) this means making the users unique, even though they pick the same username - for example adding a NA:: / EU:: prefix to their username document. When they login the application would need some logic to try looking up both document variations - likely trying the prefix for the local region first. (This is essentially the same idea as "realms" or "servers" that many MMO games use).
There are variations of both of these, but ultimately given an AP-type system (which Couchbase across XDCR is) you've essentially chosen Availability & Partition-Tolerance over Consistancy, and hence need to reconcile that at the application layer.
Put the user name registrations into a staging table until you can perform a replication to determine if the name already exists in one of the other data centers.
You tagged Couchbase, so I will answer about that.
As long as the key for each object is different, you should be fine with Couchbase. It is the keys that would be unique and work great with XDCR. Another solution would be to have a concatenated key made up of the username and other values (company name, etc) if that suits your use case, again giving you a unique key for the object. Yet another would be to have a key/value in a JSON document that is the username.
It's not clear to me whether you're using Cassandra or Couchbase.
As far as Cassandra is concerned, since version 2.0, you can use Lightweight Transactions which are created for the goal. A Serial Consistency has been created just to achieve what you need. In the above link you can read what follows:
For example, suppose that I have an application that allows users to
register new accounts. Without linearizable consistency, I have no way
to make sure I allow exactly one user to claim a given account — I
have a race condition analogous to two threads attempting to insert
into a [non-concurrent] Map: even if I check for existence before
performing the insert in one thread, I can’t guarantee that no other
thread inserts it after the check but before I do.
As far as the missing connection between two or more cluster its your choice how to handle it. If you can't guarantee the uniqueness at insert-time you can both refuse the registration or dealing with it, accepting and apologize later.
HTH, Carlo

How do I structure multiple Identity Data in a database

Am designing a database for a credit bureau and am seeking some guidance.
The data they receive from Banks, MFIs, Saccos, Utility companies etc comes with various types of IDs. E.g. It is perfectly legal to open a bank account with a National ID and also a Passport. Scenario One that has my head banging is that Customer1 will take a credit facility (call it loan for now) in bank1 with the passport and then go to bank2 and take another loan with their NationalID and Bank3 with their MilitaryID. Eventually when this data comes from the banks to the bureau, it would be seen as 3 different people while we know that its actually 1 person. At this point, there is nothing we can do as a bureau.
However, one way out (for now) is using the Govt registry which provides a repository which holds both passports and IDS. So once we query for this information and get a response, how do I show in the DB that Passport_X is related to NationalID_Y and MilitaryNumber_Z?
Again, a person's name could be captured in various orders states. Bank1 could do FName, LName, OName while Bank3 can do LName, FName only. How do I store this names?
Even against one ID type e.g. NationalID, you will often find misspellt names or missing names. So one NationalID in our database could end up with about 6 different names because the person's name was captured different by the various banks where he has transacted.
And that is just the tip of the iceberg. We have issues with addresses, telephone numbers, etc etc.
Could you have any insight as to how I'd structure my database to ensure we capture all data from all banks and provide the most accurate information possible regarding an individual? Better yet, do you have experience with this type of setup?
Thanks.
how do I show in the DB that Passport_X is related to NationalID_Y and MilitaryNumber_Z?
Trivial.
You ahve an identity table, that has an AlternateId field if the Identity is linked to another one. Use the first IDentity you created as master. Any alternative will have AlternateId pointing to it.
You need to separate the identity from the data in it, so you can have alterante versions of it, possibly with an origin and timestampt. You need oto likely fully support versioning and tying different identities to each other as alternative, including generating a "master identity" possibly by algorithm with the "official" version of your data (i.e. consolidated).
The details are complex - mostly you ahve to make a LOT of compromises without killing performance, so at the end HIRE A SPECIALIST. There is a reason there are people out as sensior database designers or architects that have 20+ years experience finding the optimal solution given the constrints you may not even be aware of (application wise).
Better yet, do you have experience with this type of setup?
Yes. Try financial information. Stock symbols / feeds / definitions are not necessariyl compatible and vary by whom you get it. Any non-trivial setup has different data feeds that may show the same item slightly different, sometimes in error. DIfferent name, sometimes different price (example: ES, CME group, is 50 USD per point, but on TT Fix it is 5 - to make up, the price is multiplied by 10, so instad of 1000.25 you get 10002.5). THis is the same line of consolidation, and it STINKS.
Tons of code, tons of proper database design, redoing it half a dozen time to get the proper performance. THis is tricky, sadly.

Is there a public geolocation system that I can query for names of local cities?

I'm looking to attempt to simplify address entry into a system where the city textbox has autosuggest initially populated by the user's geolocation. In the past it has seemed that autosuggesting the city name is prohibitively costly without knowing the province/state/country first but it doesn't make sense to require the user to enter the address backwards as we don't think about address information this way. On the other hand, not autosuggesting the city name means we end up with all sorts of weird and wonderful entries for mis-spelled cities from around the world.
I was wondering if there's a service that I can query that would automatically respond with the most appropriate city names according to not only what the user enters in the textbox, but the location of that user based on the country and political boundary they fall within?
For instance, if I am in Canada [as I am] and I enter 'Mi' then I'd be presented with all cities within Canada starting with 'Mi' until it was determined that the information I was entering wasn't Canadian at which point, it would use the next most likely configured country based on our usage pattern - i.e. it would check the U.S. next, followed by Mexico and then other less likely destinations. I can write all this myself if I had the database but I don't know where I can find one and my suspicion is that it would be less scalable than querying a pre-existing service on the web.
Looks as though MaxMind offers a free database that you could download in CSV:
There's an online demo to test it a bit if you'd like, but no way to query it through a web service.
IPInfoDB also has their database available for download - they have an XML API, but it only supports looking up the city/country for a particular IP. You're trying to do something a little more wide than that, looking for every city in a particular country, with country selected based on IP. I wouldn't expect that there's a web service for that, it's a pretty specific requirement.
Edited to add: You could use the IPInfoDB API to look up the country though, and then generate the autocomplete suggestions from a local country/city database. That way all the IP-geolocation wouldn't need to be done locally. There are various places that you can get a list of cities in a particular country. For example, here's some comprehensive lists maintained by the National Geospatial-Intelligence Agency

What is the "best" way to store international addresses in a database?

What is the "best" way to store international addresses in a database? Answer in the form of a schema and an explanation of the reasons why you chose to normalize (or not) the way you did. Also explain why you chose the type and length of each field.
Note: You decide what fields you think are necessary.
Plain freeform text.
Validating all the world's post/zip codes is too hard; a fixed list of countries is too politically sensitive; mandatory state/region/other administrative subdivision is just plain inappropriate (all too often I'm asked which county I live in--when I don't, because Greater London is not a county at all).
More to the point, it's simply unnecessary. Your application is highly unlikely to be modelling addresses in any serious way. If you want a postal address, ask for the postal address. Most people aren't so stupid as to put in something other than a postal address, and if they do, they can kiss their newly purchased item bye-bye.
The exception to this is if you're doing something that's naturally constrained to one country anyway. In this situation, you should ask for, say, the { postcode, house number } pair, which is enough to identify a postal address. I imagine you could achieve similar things with the extended zip code in the US.
In the past I've modeled forms that needed to be international after the ups/fedex shipping address forms on their websites (I figured if they don't know how to handle an international order we are all hosed). The fields they use can be used as reference for setting up your schema.
In general, you need to understand why you want an address. Is it for shipping/mailing? Then there is really only one requirement, have the country separate. The other lines are freeform, to be filled in by the user. The reason for this is the common forwarding strategy for mail : any incoming mail for a foreign country is forwarded without looking at the other address lines. Hence, the detailed information is parsed only by the mail sorter located in the country itself. Like the receiver, they'll be familiar with national conventions.
(UPS may bunch together some small European countries, e.. all the Low Countries are probably served from Belgium - the idea still holds.)
I think adding country/city and address text will be fine. country and city should be separate for reporting. Managers always ask for these kind of reports which you do not expect and I dont prefer running a LIKE query through a large database.
Not to give Facebook undue respect. However, the overall structure of the database seems to be overlooked in many web applications launching every day. Obviously I don't think there is a perfect solution that covers all the potential variables with address structure without some hard work. That said, combined with autocomplete Facebook manages to take location input data and eliminate a majority of their redundant entries. They do this by organizing their database well enough to provide autocomplete information in a low cost, low error way to the client in real time allowing them to more or less choose the correct location from an existing list.
I think the best solution is to access a third party database which contains your desired geographic scope and use it to initially seed your user location information. This will allow you to avoid doing the groudwork of creating your own. With any luck you can reduce the load on your server by allowing your new users to receive the correct autocomplete information directly off your third party supplier. Eventually you will be able to fill most autocomplete for location information such as city, country, etc. from information contained in your own database from user input data.
You need to provide a bit more details about how you are planning to use the data. For example, fields like City, State, Country can either be text in the single table, or be codes which are linked to a separate table with a Foreign Key.
Simplest would be
Address_Line_01 (Required, Non blank)
Address_Line_02
Address_Line_03
Landmark
City (Required)
Pin (Required)
Province_District
State (Required)
Country (Required)
All the above can be Text/Unicode with appropriate field lengths.
Phone Numbers as applicable.

Resources