What Data Structure is best suited to store differences - database

I have a set of pre defined properties I want to store. For example:
PersonNr,Gender,Name,Surname, Address, Zip,City.
Now I have different sources for these data sets, which share the PersonNr but have different value for the other properties:
EXAMPLE:
From Database A I get
123456,M,Hudson,James,Fakestr 123, 12345, West City
From Database B i get
123456,M,Hudson,Jameson,Fakestr123, 12345, East City
Instead of storing both values I want to store the data from Database A as a reference and only store the data from B that are different to A.
In my Example I would like to store something like:
Database B, Jameson, East City
What data structure can I use for the given problem?
Thanks in advance

The solution you choose depends a lot on the nature of your data, how you're going to store it, and what you want to do with it. If all you want is an abbreviated record that only stores the deltas, then you could write a comma-separated line that has empty fields. That is, given:
Database A
123456,M,Hudson,James,Fakestr 123, 12345, West City
Database B
123456,M,Hudson,Jameson,Fakestr123, 12345, East City
You could write a separate record showing the deltas:
123456,,,Jameson,,,East City
If you're storing the deltas in a database, then you'll probably want records that give the record identifier, the field name, and the changed value. That representation would be:
123456,Surname,Jameson
123456,City,East City
That's probably how I'd represent it in memory, too: a hash map keyed by record identifier (i.e. 123456), with a list of field name / value pairs for each ID.

Related

Sql Search query within range of data stored in groups or categories

i need to store data about activity houses for youngsters which should be stores like 15-19, 12-20, 10-23, 10-19 etc.
and then i want to be able to search that which one or more of them matched the incoming query.
for example some one want to search for activity houses that are suitbale for 20 so it should return houses that have 12-20 and 10-23.
thanks in advance.
Create your table with 2 columns representing the range allowed, e.g., minCapacity and maxCapacity.
Then you can structure your query using BETWEEN. e.g.
... WHERE #request BETWEEN minCapacity AND maxCapacity.

How to find the name of a place from latitude and longitude value

I am analysing app data which contains lat value and lon value of a user visited places. I was able to export the data to tableau and plot it on the map but I want to find the name of place for each pair of lat and lon.
One solution could be, if I get a table of three columns (Lat, Lon, Place) then I can join it with my user data table to find the name of a place at a given Lat and Lon.
My question is, do we have a ready made table with the above three columns which I can import in my SQL-Server? I am interested in places of UK or London. Is there any other approach to achieve it?
You can get this from the Ordinance Survey which should get you lat, long, postcode;
https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html
You'll then need another data source to map the postcode to location name (e.g. town, county etc). See the similar post below;
Where can I find a list of all UK _full_ postcodes including street name and their precise coordinates?
It might take a little fiddling about, and you're always going to have the issue with data being a little out of date but it should be good enough.
I wrote an API wrapper in R for postcodes.io, which is a free UK postcode database. Check the original documentation so that you could create an API wrapper in your language of choice. Wrappers in languages other than R are also available.
If you use R, then type you can get the place names in the following way:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("erzk/PostcodesioR")
library(PostcodesioR)
rev_geo <- reverse_geocoding(0.127, 51.507)
It will return a list with extensive information about the latitude and longitude, e.g. wards, NUTS, administrative district, county, parish, consituency, CCG and many more.
There is also a bulk_reverse_geocoding() function which takes several lat and lon inputs.

MS Access: Search through Table for the same part of the String

I am trying to loop through Column Values inside my Table.
I have a register form, which provides the user with UNIQUE ID, based uppon his information.
For example:
Country = Austria
Each user that selects country Austria will get some sort of Unique Value for that match (lets say 00).
Account ID look like this:
XXXX00UNIQUECODE
Each country has it´s own unique value: (AT = 00, DE = 01, etc)
Now, I want to generate a UNIQUE CODE for each user, that will be just an increment (+1) value of the previous UC value stored in the table, for the same country!
In order to do that, I need to somehow loop through the Column, where the Account IDs are stored and search for the match.
The thing is, when a user tries to generate the UNIQUE CODE, he does not have it yet, so he has only:
XXXX00
Now I need to find all the XXXX00 strings in my AccountID Column, and store them in an Array - then find the Max Value of those and increment it.
BUT I dont know how to search for a part of the string inside a Column of the Table ?
Just the XXXX00 part, not the entire Account ID XXXX00UNIQUECODE.
Agh, I hope you can understand me. It´s quite complicated I know, but I´m really stuck here. Hopefully, someone will know what I mean and maybe even find a smoother solutions for this.
Thanks in advance!
You're pounding a square peg into a round hole. Why not just create a new column called UserID and then you can do:
SELECT Max(UserID) FROM MyTable WHERE Mid(AccountID, 5, 2) = "00"
and increment it by 1.
Better yet, store CountryCode, UserID and the XXXX part in separate fields, and index them. It'll save time when you search or filter, which I'm assume you're going to be doing.

SOLR search array vs individual documents

I've got a business case where I need to check if the search query is about displays businesses
eg: q="night clubs new york"
I've got a list of Countries, state city and region in my database 3million + records and I've got a list of business categories.
All I want to do is check if in the query has a business category in it (night clubs) and does it have a City, state or country's name (new york). So i'm checking the number of results retuned for the below query. If I get 2 numResults then this is a business query and then I query my Solr index to search for businesses.
query: places_ss:(night clubs new york) OR categories_ss:(night clubs new york)
Speed Question: How should I save the list of cities, states and countries in SOLR to get maximum search speed ?
Have one document id:places and add distinct cities, states and countries in on array places_ss
have multiple documents with different id's with 100,000 place names in each document in an array.
?
have a document or multiple documents with place_s string(not array) each place separated by space and each space in place separated by underscore eg: new york becomes new_york.
And during query time I will get multiple combinations of night clubs new york
eg: night night_clubs night_clubs_new night_clubs_new_york clubs_new clubs_new_york new_york york and query for place.
Would it be a good idea to have a separate core just for above place documents to increase speed ?
Is this a good solution ?
Document organisation :
better to have a document approche with :
- location
- activity
- other things needed!
location
You should save your location like this
Country:state:city:suburb.... so that you can seach in usa:new york:new york*
of ::new york
No need for _
avoid that, there is no needs !
activity
activity should be stored in another field for precision on the search and speed.

DB Design: Sort Order for Lookup Tables

I have an application where the database back-end has around 15 lookup tables. For instance there is a table for Counties like this:
CountyID(PK) County
49001 Beaver
49005 Cache
49007 Carbon
49009 Daggett
49011 Davis
49015 Emery
49029 Morgan
49031 Piute
49033 Rich
49035 Salt Lake
49037 San Juan
49041 Sevier
49043 Summit
49045 Tooele
49049 Utah
49051 Wasatch
49057 Weber
The UI for this app has a number of combo boxes in various places for these lookup tables, and my client has asked that the boxes list in this case:
CountyID(PK) County
49035 Salt Lake
49049 Utah
49011 Davis
49057 Weber
49045 Tooele
'The Rest Alphabetically
The best plan I have for accomplishing this is to add a column to each lookup table for SortOrder(numeric). I had a colleague tell me he thought that would cause the tables to violate 3rd-Normal-Form, but I think the sort order still depends on the key and only the key (even though the rest of the list is alphabetical).
Is adding the SortOrder column the best way to do this, or is there a better way I am just not seeing?
I agree with #cletus that a sort order column is a good way to go and it does not violate 3NF (because, as you said, the sort order column entries are functionally dependent on the candidate keys of the table).
I'm not sure I agree that alphanumeric is better than numeric. In the specific case of counties, there are seldom new ones created. But there is no requirement that the numbers assigned are sequential; you can allocate them with numbers that are a multiple of a hundred, for example, leaving ample room for insertions.
Yes I agree a sort order column is the best solution when the requirements call for a custom sort order like the one you cite. I wouldn't go with a numeric column however. If the data is alphanumeric, the sort order should be alphanumeric. That way you can seed the value with whatever is in the county field.
If you use a numeric field you'll have to resequence the entire table (potentially) whenever you add a new entry. So:
Columns: ID, County, SortOrder
Seed:
UPADTE County SET SortOrder = CONCAT('M-', County)
and for the special cases:
UPDATE County
SET SortOrder = CONCAT('E-' . County)
WHERE County IN ('Salt Lake', 'Utah', 'Davis', 'Weber', 'Tooele')
Arguably you may want to put another marker column in to indicate those entries are special.
I went with numeric and large multiples.
Even with the CONCAT('E-'.. example, I don't get the required sort order. That would give me Davis, SL, Tooele... and Salt Lake needs to be first.
I ended up using multiples of 10 and assigned the non-special-sort entries a value like 10000. That way the view for each lookup can have
ORDER BY SortOrder ASC, OtherField ASC
Another programmer suggested using DECODE in Oracle, or CASE statements in SQL Server, but this is a more general solution. YMMV.

Resources