DocumentDB Sql Injection? - sql-server

I'm trying to offload some client specific query building to the client. I don't think I'm in danger of sql injection for documentdb since it doesn't have UPDATE or DELETE statements but i'm not positive. Additionally, I don't know if these will be added in the future.
Here is an example of my problem.
IceCreamApp wants to find all flavors where the name is like "choco". A flavor document looks like this-
{
"name": "Chocolate",
"price": 1.50
}
The API knows about the DocumentDB and knows how to request data from it, but it doesn't know the entity structure of any of the clients entities. So to do this on the API-
_documentClient.CreateDocumentQuery("...")
.Where((d) => d.name.Contains(query));
Would throw an error (d is dynamic and name isn't necessarily a common property).
I could build this on the client and send it.
Client search request-
{
"page": 1,
"pageSize": 10,
"query": "CONTAINS(name, 'choco')"
}
Without sanitzation this would be a big no-no for sql. But does it / will it ever matter for documentdb? How safe am I to run un-sanitized client queries?

As this official document Announcing SQL Parameterization in DocumentDB:
Using this feature, you can now write parameterized SQL queries. Parameterized SQL provides robust handling and escaping of user input, preventing accidental exposure of data through “SQL injection” *. Let's take a look at a sample using the .NET SDK; In addition to plain SQL strings and LINQ expressions, we've added a new SqlQuerySpec class that can be used to build parameterized queries.
DocumentDB is not susceptible to the most common kinds of injection attacks that lead to “elevation of privileges” because queries are strictly read-only operations. However, it might be possible for a user to gain access to data they shouldn’t be accessing within the same collection by crafting malicious SQL queries. SQL parameterization support helps prevent these sort of attacks.
Here's a official sample that queries a "Books" collection with a single user supplied parameter for author name:
POST https://contosomarketing.documents.azure.com/dbs/XP0mAA==/colls/XP0mAJ3H-AA=/docs
HTTP/1.1 x-ms-documentdb-isquery: True
x-ms-date: Mon, 18 Aug 2014 13:05:49 GMT
authorization: type%3dmaster%26ver%3d1.0%26sig%3dkOU%2bBn2vkvIlHypfE8AA5fulpn8zKjLwdrxBqyg0YGQ%3d
x-ms-version: 2014-08-21
Accept: application/json
Content-Type: application/query+json
Host: contosomarketing.documents.azure.com
Content-Length: 50
{
"query": "SELECT * FROM books b WHERE (b.Author.Name = #name)",
"parameters": [
{"name": "#name", "value": "Herman Melville"}
]
}

Related

How to print the count of array elements along with another variable in MongoDB

I have a data collection which contains records in the following format.
{
"_id": 22,
"title": "Hibernate in Action",
"isbn": "193239415X",
"pageCount": 400,
"publishedDate": ISODate("2004-08-01T07:00:00Z"),
"thumbnailUrl": "https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/bauer.jpg",
"shortDescription": "\"2005 Best Java Book!\" -- Java Developer's Journal",
"longDescription": "Hibernate practically exploded on the Java scene. Why is this open-source tool so popular Because it automates a tedious task: persisting your Java objects to a relational database. The inevitable mismatch between your object-oriented code and the relational database requires you to write code that maps one to the other. This code is often complex, tedious and costly to develop. Hibernate does the mapping for you. Not only that, Hibernate makes it easy. Positioned as a layer between your application and your database, Hibernate takes care of loading and saving of objects. Hibernate applications are cheaper, more portable, and more resilient to change. And they perform better than anything you are likely to develop yourself. Hibernate in Action carefully explains the concepts you need, then gets you going. It builds on a single example to show you how to use Hibernate in practice, how to deal with concurrency and transactions, how to efficiently retrieve objects and use caching. The authors created Hibernate and they field questions from the Hibernate community every day - they know how to make Hibernate sing. Knowledge and insight seep out of every pore of this book.",
"status": "PUBLISH",
"authors": ["Christian Bauer", "Gavin King"],
"categories": ["Java"]
}
I want to print title, and authors count where the number of authors is greater than 4.
I used the following command to extract records which has more than 4 authors.
db.books.find({authors:{$exists:true},$where:'this.authors.length>4'},{_id:0,title:1});
But unable to print the number of authors along with the title. I tried to use the following command too. But it gave only the title list.
db.books.find({authors:{$exists:true},$where:'this.authors.length>4'},{_id:0,title:1,'this.authors.length':1});
Could you please help me to print the number of authors here along with the title?
You can use aggregation framework's $project with $size to reshape your data and then $match to apply filtering condition:
db.collection.aggregate([
{
$project: {
title: 1,
authorsCount: { $size: "$authors" }
}
},
{
$match: {
authorsCount: { $gt: 4 }
}
}
])
Mongo Playground

Multiple insert into mongodb - only the first collection gets updated

I am trying to update my collections in my mongodb instance hosted on mlab.
I am running the following code:
...
db.collectionOne.insert(someArrayOfJson)
db.collectionTwo.insert(someArrayOfJson)
The first collection gets updated and the second doesn't.
Using the same/different valid Json arrays produce the same outcome. Only the first gets updated.
I have seen this question duplicate document - same collection and I can understand why it wouldn't work. But my problem is across two seperate collections?
When inserting the data manually on mlab the document goes in the second collection fine - so I am lead to believe it allows duplicate data accross seperate collections.
I am new to mongo - am I missing something simple?
Update:
The response is:
22:01:53.224 [main] DEBUG org.mongodb.driver.protocol.insert - Inserting 20 documents into namespace db.collectionTwo on connection [connectionId{localValue:2, serverValue:41122}] to server ds141043.mlab.com:41043
22:01:53.386 [main] DEBUG org.mongodb.driver.protocol.insert - Insert completed
22:01:53.403 [main] DEBUG org.mongodb.driver.protocol.insert - Inserting 20 documents into namespace db.collectionOne on connection [connectionId{localValue:2, serverValue:41122}] to server ds141043.mlab.com:41043
22:01:55.297 [main] DEBUG org.mongodb.driver.protocol.insert - Insert completed
But there is nothing entered into the db for the second dataset.
Update v2:
If I make a call after the two inserts such as:
db.createCollection("log", { capped : true, size : 5242880, max : 5000 } )
The data collections get updated!
what is total data size ???
here is the sample code it works for me
db.collectionOne.insert([{"name1":"John","age1":30,"cars1":[ "Ford", "BMW", "Fiat"]},{"name1":"John","age1":30,"cars1":[ "Ford", "BMW", "Fiat" ]}]); db.collectionTwo.insert([{"name2":"John","age2":30,"cars2":[ "Ford", "BMW", "Fiat"]},{"name2":"John","age2":30,"cars2":[ "Ford", "BMW", "Fiat" ]}])
If data is more the you can use "Mongo Bulk Write Operations" and also you can refer Mongo DB limits and thresholds
https://docs.mongodb.com/manual/reference/limits
How did you determine that the second collection did not get updated?
I believe you are simply seeing the difference between NoSQL and SQL databases. A SQL database will guarantee you that a read after a successful write will read the data you just wrote. A NoSQL database does not guarantee that you can immediately read data you just wrote. See this answer for more details.

Solr request: SQL-like JOIN, GROUP BY, SUM(), WHERE SUM()

I'm new to Solr and I have the following problem:
I have those documents:
category:contract:
{
"contract_id_s": "contract-ENG-00001",
"title_s": "contract title",
"ref_easy_s": "REFAAA",
"commitment_id_s": "ENG-00001",
},
category:commitment:
{
"commitment_id_s": "ENG-00001",
"title_s": "commitment title",
"status_s": "Validated",
"date_changed_status_s": "2015-09-30",
"date_status_initiated_s": "2015-09-27",
"date_status_confirmed_s": "2015-09-28",
"date_status_validated_s": "2015-09-30",
},
category:commitment AND sub_category_s:commitment_project:
{
"id": "ENG-00001_AAA",
"commitment_id_s": "ENG-00001",
"project_id_s": "AAA",
"project_name_s": "project name",
"project_amount_asked_s": "2000",
"project_amount_validated_s": "2100"
},
{
"id": "ENG-00001_AAA2",
"commitment_id_s": "ENG-00001",
"project_id_s": "AAA",
"project_name_s": "project name",
"project_amount_asked_s": "1000",
"project_amount_validated_s": "1200"
},
For each commitment, there could be a contract.
For each commitment, there could be some payments.
Here is what I want to do:
- by default, only select commitment that have at least :
. one sub_category_s:commitment_project with a project_amount_validated_s value.
. one contract.
- if filtered on amounts, only select in this list, commitments with the SUM of project_amount_validated_s > amount_min AND < amount_max.
I don't know what is the best practice in terms of performance?
- Requesting the ids of the commitments then requesting the details for them?
- Is there a way to JOIN the contract informations in this request?
- Or the best practice is to request each document one by one?
The problem is that I don't want to request useless data (performance, bandwidth).
There are some tools available to you in the form of:
Solr's Block Join Query Parser (which allows for simple parent/child
queries).
Solr Facets (which allow for aggregrations (e.g. sum of payments) ... with recent support for faceting on parent/child fields).
The Solr Expand Component (which recently allows parent information to be expanded from a child block join query).
However, I'm not certain you can do everything you're hoping in one query (using with these pieces). And even if you can, stitching them together doesn't even come close the the simplicity of the SELECT...JOIN...GROUP BY...HAVING SQL query you're hoping to replicate. (Unless you want to try out the Solr 6 developer snapshot with parallel SQL support)
BUT If this is your only use-case, AND Solr is not your primary datastore, I'd strongly recommend modeling your Solr data to fit your use-case.
E.g. Start simple, denormalize, and only include the fields in your datamodel needed for search:
Only one type of record: commitment
Fields
commitment_id_s
title_s
status_s
date_changed_status_s
date_status_initiated_s
date_status_confirmed_s
date_status_validated_s
total_payments_asked (numeric sum of project_amount_asked from DB)
total_payments_validated (numeric sum of project_amount_validated from DB)
project_names (multiValued list of searchable project names)
contract_names (multiValued list of searchable contract names)
Then your query just needs a filter:
total_payments_validated:[<amount_min>TO<amount_max>]
to enforce your default criteria.
Once your search has identified the commitment IDs matching the Solr query, then go back and query the source database for any additional information needed for display (project details, contract details, dates, etc...)
Ok, I've found a solution by using !join.
For instance, in PHP:
[
'q' => "{!join from=id to=service_id score=none}uri:\\$serviceUri* AND -deleted:true",
'fq' => "{!cache=false}category:monthly_volume AND type:\"$type\" AND timestamp:[$strDateStart TO $strDateEnd]",
'alt' => 'json',
'max-results' => 1000,
'sort' => 'timestamp ASC',
'statsFields' => 'stats.field=value&stats.facet=timestamp',
]
Or with URL request:
http://localhost:8983/solr/fluks-admin/select?q={!join+from=id+to=sector_id+score=none}{!join+from=uri+to=service+score=none}uri:/test-en/service-en*+AND+-deleted:true&fq={!cache=false}category:indicator+AND+timestamp:[201608+TO+201610]+AND+type:("-3"+OR+2+OR+3)+AND+-deleted:true&wt=json&indent=true&json.facet={sum_timestamp:{terms:{limit:-1, field:timestamp, facet:{sum_type:{terms:{limit:-1, field:type, facet:{sum_vol_value:"sum(vol_value)"}}}}}}}

Generalized way to extract JSON from a relational database?

Ok, maybe this is too broad for StackOverflow, but is there a good, generalized way to assemble data in relational tables into hierarchical JSON?
For example, let's say we have a "customers" table and an "orders" table. I want the output to look like this:
{
"customers": [
{
"customerId": 123,
"name": "Bob",
"orders": [
{
"orderId": 456,
"product": "chair",
"price": 100
},
{
"orderId": 789,
"product": "desk",
"price": 200
}
]
},
{
"customerId": 999,
"name": "Fred",
"orders": []
}
]
}
I'd rather not have to write a lot of procedural code to loop through the main table and fetch orders a few at a time and attach them. It'll be painfully slow.
The database I'm using is MS SQL Server, but I'll need to do the same thing with MySQL soon. I'm using Java and JDBC for access. If either of these databases had some magic way of assembling these records server-side it would be ideal.
How do people migrate from relational databases to JSON databases like MongoDB?
Here is a useful set of functions for converting relational data to JSON and XML and from JSON back to tables: https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/
SQL Server 2016 is finally catching up and adding support for JSON.
The JSON support still does not match other products such as PostgreSQL, e.g. no JSON-specific data type is included. However, several useful T-SQL language elements were added that make working with JSON a breeze.
E.g. in the following Transact-SQL code a text variable containing a JSON string is defined:
DECLARE #json NVARCHAR(4000)
SET #json =
N'{
"info":{
"type":1,
"address":{
"town":"Bristol",
"county":"Avon",
"country":"England"
},
"tags":["Sport", "Water polo"]
},
"type":"Basic"
}'
and then, you can extract values and objects from JSON text using the JSON_VALUE and JSON_QUERY functions:
SELECT
JSON_VALUE(#json, '$.type') as type,
JSON_VALUE(#json, '$.info.address.town') as town,
JSON_QUERY(#json, '$.info.tags') as tags
Furhtermore, the OPENJSON function allows to return elements from referenced JSON array:
SELECT value
FROM OPENJSON(#json, '$.info.tags')
Last but not least, there is a FOR JSON clause that can format a SQL result set as JSON text:
SELECT object_id, name
FROM sys.tables
FOR JSON PATH
Some references:
https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
https://learn.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server
https://blogs.technet.microsoft.com/dataplatforminsider/2016/01/05/json-in-sql-server-2016-part-1-of-4/
https://www.red-gate.com/simple-talk/sql/learn-sql-server/json-support-in-sql-server-2016/
I think one 'generalized' solution will be as follows:-
Create a 'select' query which will join all the required tables to fetch results in a 2 dimentional array (like CSV / temporary table, etc)
If each row of this join is unique, and the MongoDB schema and the columns have one to one mapping, then its all about importing this CSV/Table using MongoImport command with required parameters.
But a case like above, where a given Customer ID can have an array of 'orders', needs some computation before mongoImport.
You will have to write a program which can 'vertical merge' the orders for a given customer ID.For small set of data, a simple java program will work. But for larger sets, parallel programming using spark can do this job.
SQL Server 2016 now supports reading JSON in much the same way as it has supported XML for many years. Using OPENJSON to query directly and JSON datatype to store.
There is no generalized way because SQL Server doesn’t support JSON as its datatype. You’ll have to create your own “generalized way” for this.
Check out this article. There are good examples there on how to manipulate sql server data to JSON format.
https://www.simple-talk.com/blogs/2013/03/26/sql-server-json-to-table-and-table-to-json/

SOQL - Convert Date To Owner Locale

We use the DBAmp for integrating Salesforce.com with SQL Server (which basically adds a linked server), and are running queries against our SF data using OPENQUERY.
I'm trying to do some reporting against opportunities and want to return the created date of the opportunity in the opportunity owners local date time (i.e. the date time the user will see in salesforce).
Our dbamp configuration forces the dates to be UTC.
I stumbled across a date function (in the Salesforce documentation) that I thought might be some help, but I get an error when I try an use it so can't prove it, below is the example useage for the convertTimezone function:
SELECT HOUR_IN_DAY(convertTimezone(CreatedDate)), SUM(Amount)
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))
Below is the error returned:
OLE DB provider "DBAmp.DBAmp" for linked server "SALESFORCE" returned message "Error 13005 : Error translating SQL statement: line 1:37: expecting "from", found '('".
Msg 7350, Level 16, State 2, Line 1
Cannot get the column information from OLE DB provider "DBAmp.DBAmp" for linked server "SALESFORCE".
Can you not use SOQL functions in OPENQUERY as below?
SELECT
*
FROM
OPENQUERY(SALESFORCE,'
SELECT HOUR_IN_DAY(convertTimezone(CreatedDate)), SUM(Amount)
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))')
UPDATE:
I've just had some correspondence with Bill Emerson (I believe he is the creator of the DBAmp Integration Tool):
You should be able to use SOQL functions so I am not sure why you are
getting the parsing failure. I'll setup a test case and report back.
I'll update the post again when I hear back. Thanks
A new version of DBAmp (2.14.4) has just been released that fixes the issue with using ConvertTimezone in openquery.
Version 2.14.4
Code modified for better memory utilization
Added support for API 24.0 (SPRING 12)
Fixed issue with embedded question marks in string literals
Fixed issue with using ConvertTimezone in openquery
Fixed issue with "Invalid Numeric" when using aggregate functions in openquery
I'm fairly sure that because DBAmp uses SQL and not SOQL, SOQL functions would not be available, sorry.
You would need to expose this data some other way. Perhaps it's possible with a Salesforce report, web-service, or compiling the data through the program you are using to access the (DBAmp) SQL Server.
If you were to create a Salesforce web service, the following example might be helpful.
global class MyWebService
{
webservice static AggregateResult MyWebServiceMethod()
{
AggregateResult ar = [
SELECT
HOUR_IN_DAY(convertTimezone(CreatedDate)) Hour,
SUM(Amount) Amount
FROM Opportunity
GROUP BY HOUR_IN_DAY(convertTimezone(CreatedDate))];
system.debug(ar);
return ar;
}
}

Resources