I am building a ipv4/ipv6 geo ip MongoDB database and i will have millions (100+) of ips
Structure will of the database will be
[
{ _id: 58fdbf5c0ef8a50b4cdd9a8e , ip: '34.53.63.25', ip_hex: '0x22353f19' , type: "ipv4", data : [
{
country : "CA",
region : "region1"
city : "city1",
blacklisted : "no"
on_date : ISODate("2022-05-05T00:00:00Z")
},
{
country : "CA",
region : "region1"
city : "city1",
blacklisted : "yes"
on_date : ISODate("2022-06-05T00:00:00Z")
},
{
country : "US",
region : "region2"
city : "city2",
blacklisted : "no"
on_date : ISODate("2022-05-05T00:00:00Z")
},
...
]},
{ _id: 58fdbf5c0ef8a50b4cdd9a8e , ip: '1.2.3.4', ip_hex: '0x1020304', type: "ipv4", data : [
{
country : "CA",
region : "region1"
city : "city1",
blacklisted : "no"
on_date : ISODate("2022-06-05T00:00:00Z")
},
]},
{ _id: 58fdbf5c0ef8a50b4cdd9a8e , ip: '2345:0425:2CA1:0000:0000:0567:5673:23b5', ip_hex: '0x234504252ca1000000000567567323b5', type: "ipv6", data : [
{
country : "FR",
region : "region1"
city : "city1",
blacklisted : "no"
on_date : ISODate("2022-06-05T00:00:00Z")
},
{
country : "FR",
region : "region1"
city : "city1",
blacklisted : "yes"
on_date : ISODate("2022-07-05T00:00:00Z")
},
...
]},
]
I am converting all IP string data to HEX :
1.1.1.1 -> 0x1010101
1.2.3.4 -> 0x1020304
34.53.63.25 -> 0x22353f19
255.255.255.255 -> 0xffffffff
0001:0001:0001:0001:0001:0001:0001:0001 -> 0x10001000100010001000100010001
1111:1111:1111:1111:1111:1111:1111:1111 -> 0x11111111111111111111111111111111
2345:0425:2CA1:0000:0000:0567:5673:23b5 -> 0x234504252ca1000000000567567323b5
2345:0425:2CA1::0567:5673:23b5 -> 0x234504252ca1000000000567567323b5
ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff -> 0xffffffffffffffffffffffffffffffff
There will be a lot of searches by IP and it will be added/deleted/updated new data to every IP each day.
I will search ranges of ips, sort, update, delete.
What index is recommended on "ip_hex" column? I was thinking about a B-tree search on HEX not STR.
I want to have an efficient database. What other optimizations should i take into consideration?
Thank you.
You should pad the hex strings with leading "0", otherwise sort and ranges will not work properly.
Note the different formats of IPv6 addresses. Following addresses are all valid representation for the same IP
2345:0425:2CA1:0000:0000:0567:5673:23b5
2345:0425:2CA1::0567:5673:23b5
2345:0425:2ca1:0:0:0567:5673:23b5
2345:0425:2CA1:0000:0000:567:5673:23b5
Ensure a consistent format, otherwise the query will be difficult.
When you talk about IP-Ranges then you mean typically IP-Subnets. In order to calculate subnets you need boolean functions like and or not xor. They are not available in MongoDB aggregation framework (see https://jira.mongodb.org/browse/SERVER-55386?filter=-2), you need to process them outside.
Related
I'm using Mongodb to analysee a Nobel prizes dataset which documents look like these:
> db.laureate.find().pretty().limit(1)
{
"_id" : ObjectId("604bc8c847d640142f02b3b1"),
"id" : "1",
"firstname" : "Wilhelm Conrad",
"surname" : "Röntgen",
"born" : "1845-03-27",
"died" : "1923-02-10",
"bornCountry" : "Prussia (now Germany)",
"bornCountryCode" : "DE",
"bornCity" : "Lennep (now Remscheid)",
"diedCountry" : "Germany",
"diedCountryCode" : "DE",
"diedCity" : "Munich",
"gender" : "male",
"prizes" : [
{
"year" : "1901",
"category" : "physics",
"share" : "1",
"motivation" : "\"in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him\"",
"affiliations" : [
{
"name" : "Munich University",
"city" : "Munich",
"country" : "Germany"
}
]
}
]
}
As you see the column "prizes" has embedded documents and the query I am trying to do is finding only those laureates who won two prizes (which I already know to be Marie Curie and Linus Pauling) can you help me with that?
Thanks in advance!
The $size operator should work fine for this. You could read about it if you want in this link: https://docs.mongodb.com/manual/reference/operator/query/size/
Your new query:
db.laureate.find({prizes: {$size: 2}}).pretty().limit(1)
I would like to get the user's creation date and last modification date via SQL.
I managed to get the meta.created and meta.lastModified via SCIM but I would really like to get this information via SQL directly if it's possible.
Example output (/scim/v2/Users?startIndex=200&count=1)
{
"schemas" : [ "urn:ietf:params:scim:api:messages:2.0:ListResponse" ],
"Resources" : [ {
"schemas" : [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:enterprise:2.0:User" ],
"id" : "1",
"meta" : {
"resourceType" : "User",
"created" : "2020-05-27T09:09:19Z",
"lastModified" : "2020-05-27T09:09:19Z"
},
"userName" : "synthesis_user",
"displayName" : "synthesis_user_display_name",
"name" : {
"givenName" : "synthesis_user_first_name",
"familyName" : "synthesis_user_last_name"
},
"emails" : [ {
"primary" : true,
"value" : "synthesis.user#snowflake.com"
} ],
"active" : false,
"urn:ietf:params:scim:schemas:extension:enterprise:2.0:User" : {
"defaultWarehouse" : "synthesis_warehouse",
"defaultRole" : "synthesis_role",
"snowflakeUserName" : "synthesis_name"
}
} ],
"totalResults" : 1,
"startIndex" : 1,
"itemsPerPage" : 1
}
I managed to get the creation date via the column CREATED_ON on `SNOWFLAKE.ACCOUNT.USERS':
SELECT CREATED_ON FROM "SNOWFLAKE"."ACCOUNT_USAGE"."USERS" WHERE NAME='xxxx';
But there is no other column there to indicate "last modification date" (the only columns that return a timestamp TIMESTAMP_LTZ are CREATED_ON, DELETED_ON, BYPASS_MFA_UNTIL, LAST_SUCCESS_LOGIN, EXPIRES_AT, LOCKED_UNTIL_TIME).
So my question is there some other place where I can get when what the last change made to the user? And by change made to the user I mean changes to the objectProperties of objectType user, or in other words the last ALTER USER xxxx SET ...
To best of my knowledge, User Last_modified using SQL is not available at this point.
{
"_id" : 654321,
"first_name" : "John",
"last_name" : "Doe",
"interested_by" : [ "electronics", "sports", "music" ],
"address" : {
"name" : "John Doe",
"company" : "Resultri",
"street" : "1015 Mapple Street",
"city" : "San Francisco",
"state" : "CA",
"zip_code" : 94105
}
}
How can i find the name of elements in array 'intersted_by' using command??
You can have the size of your result in the mongo shell using :
db.collection.count()
Replace collection by the name of your collection. You can also add a find condition like this :
db.collection.find().count()
Like that, you wan restrict your result with deffernts clauses before count the numbers of data corresponding
Edit : don't forget to do the command use databaseName
if you're not in your database, it does'nt work
You can count the amount of keys by doing:
var count = Object.keys(myObject).length;
I have data for customers with more than one adresses with json representation like this:
{
"firstName" : "Max",
"lastName" : "Mustermann",
"addresses" : [{
"city" : "München",
"houseNumber" : "1",
"postalCode" : "87654",
"street" : "Leopoldstraße",
}, {
"city" : "Berlin",
"houseNumber" : "2a",
"postalCode" : "12345",
"street" : "Kurfürstendamm",
}
]
}
these json is stored in a column named json of datatype json in a table named customer.
I want to query like this:
SELECT *
FROM customer cust,
json_array_elements(cust.json#>'{addresses}') as adr
WHERE adr->>'city' like '%erlin'
and adr->>'street' like '%urf%';
Query works fine ... but can't create index that postgresql 9.3.4 can use.
Any idea?
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
In my script (PHP), I use the tz database label for each location in my database:
http://en.wikipedia.org/wiki/Tz_database
I am entering a whole series of places (countries, provinces, states, cities, etc.) from around the world, and for each, I have found it very difficult to know which label from the IANA Time Zone Database to use.
Searching the location on wikipedia gives me the time offset (UTC+n) and the timezone abbreviation (EST, etc.) but neither helps in knowing which entry to use in the tz database.
E.g.:
Washington DC uses, as best as I could figure out, 'America/New_york'.
Dallas, TX uses 'America/Chicago' (unless I'm mistaken!)
There is a small African country for which neither its capital city nor its largest city were in the tz database. Which tz entry to use?
There must be a database somewhere, or a resource which clearly links every location on earth (country, state, province, city, etc.) to a specific entry in the tz database. Where can I find it?
combining google map API works fine for example use below API to get location from address.
address is "Mountain View, CA, US"
https://maps.googleapis.com/maps/api/geocode/json?address=Mountain+View,+CA,+US&key=YOUR_API_KEY
this URL returns JSON some thing like this
{
"results" : [
{
"address_components" : [
{
"long_name" : "Mountain View",
"short_name" : "Mountain View",
"types" : [ "locality", "political" ]
},
{
"long_name" : "Santa Clara County",
"short_name" : "Santa Clara County",
"types" : [ "administrative_area_level_2", "political" ]
},
{
"long_name" : "California",
"short_name" : "CA",
"types" : [ "administrative_area_level_1", "political" ]
},
{
"long_name" : "United States",
"short_name" : "US",
"types" : [ "country", "political" ]
},
{
"long_name" : "94043",
"short_name" : "94043",
"types" : [ "postal_code" ]
}
],
"formatted_address" : "Mountain View, CA 94043, USA",
"geometry" : {
"location" : {
"lat" : 37.4224764,
"lng" : -122.0842499
},
"location_type" : "GEOMETRIC_CENTER",
"viewport" : {
"northeast" : {
"lat" : 37.4238253802915,
"lng" : -122.0829009197085
},
"southwest" : {
"lat" : 37.4211274197085,
"lng" : -122.0855988802915
}
}
},
"place_id" : "ChIJ2eUgeAK6j4ARbn5u_wAGqWA",
"types" : [ "street_address" ]
}
],
"status" : "OK"
}
and you can extract location from JSON and call location to time zone API.
API call sample:
https://maps.googleapis.com/maps/api/timezone/json?location=39.6034810,-119.6822510×tamp=1331161200&key=YOUR_API_KEY
that returns JSON like this
{
"dstOffset" : 0,
"rawOffset" : -28800,
"status" : "OK",
"timeZoneId" : "America/Los_Angeles",
"timeZoneName" : "Pacific Standard Time"
}
One approach you might consider would be to find the latitude and longitude of the location in question.
Then you can look up the applicable time zone for those coordinates using one if the methods described here.
The best I have found so far is this:
http://twiki.org/cgi-bin/xtra/tzdatepick.html
but it only has cities.
E.g. it lists Paris, but it doesn't list France. If a whole country is included within a single timezone (e.g. any city in France would be listed under Europe/Paris) then I'd like to know it.
There is no listing for Idaho (which is split between two zones) and neither for Moscow, Idaho.
I am using the following sources for the US:
http://en.wikipedia.org/wiki/List_of_U.S._states_by_time_zone
but a lot of guesswork is needed for any state and city.
This list is barely usable but the best I've got so far:
http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
Mocow, Idaho?
Fort Wayne, Indiana?
Hard to tell from the above list when we don't personally know the said locations.