change Commercetools Product Projection Search sort rules - commercetools

Good afternoon, the standard sorting in comerstools supports sorting alphabetically. The sorting principle is special characters first, then numbers, then letters. I need records to be returned
A-Z
numbers
specsymbols
and if the record starts with a space, then this space is not taken.
"A"
" B"
"9"
"("
Is it possible to do this with the standard tools of comerstools? The documentation says only about sorting in ascending and descending order. I need to set a different sorting principle.
I'm trying to use the queries described in the documentation
products-search#sorting

Currently, the sort feature cannot be customized as you describe.
As mentioned in the documentation, if multiple sort expressions are specified via multiple sort parameters, they are combined into a composed sort where the results are first sorted by the first expression, followed by equal values being sorted according to the second expression, and so on. This could be helpful to your use case.
https://docs.commercetools.com/api/general-concepts#sorting
Best regards,
Michael

Related

splitting JSON string using regex

I want to split a JSON document and which has a pattern like [[[1,2],[3,4][5,6]]] using regex. The pairs represent x ad y. What I want to do it to take this string and produce a list with {"1,2", "3,4","5,6"}. Eventually I want to split the pairs. I was thinking I can make a list of {"1,2", “3,4","5,6"} and use the for loop to split the pairs. Is this approach correct to get the x and y separately?
JSON is not a regular language, but a Context free language, and as such, cannot be matched by a regular expresion. You need a full JSON parser like the ones referenced in the comments to your question.
... but, if you are going to have a fixed structure, like only three levels of square brakets only, and with the structure you posted in your question, then there's a regexp that can parse it (It would be a subset of the JSON grammar, not general enough to parse other JSON contents):
You'll have numbers: ([+-]?[0-9]+)
Then you'll have brackets and separators: \[\[\[, ,, \],\[ and \]\]\]
and finally, put all this together:
\[\[\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\]\]\]
and if you want to permit spaces between symbols, then you need:
\s*\[\s*\[\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*\]\s*\]\s*
This regexp will have six matching groups that will match the corresponding integers in the matching string as the folloging demo
Clarification
Regular languages, and regular grammars, and regular expressions form a class of languages with many practical properties, for example:
You can parse them efficiently in one pass with what is called a finite automaton
You can define the automaton to accept language sentences simply with a regular expression.
You can simply operate with regexps (or with automata) to make more complex acceptors (for the union of language sets, intersection, symmetric difference, concatenation, etc) to make acceptors for them.
You can simply say if one regular expression (the language it defines) is a subset, superset or none of the language of the original.
By contrast, it limits the power of languages that can be defined with it:
you cannot define languages that allow nesting of subexpressions (like the bracketing you allow in JSON expressions or the tag nesting allowed in XML documents)
you cannot define languages which collect context and use it in another place of the sentence (for example, sentences that identify a number and have to match that same number in another place of the sentence)
But, the meaning of my answer is that, if you bind the upper limit of nesting (let's say, for example, to three levels of parenthesis, like the example you posted) you can make your language regular and then parse it with the regular expression. It is not easy to do that, because this often leads to complex expressions (as you have seen in my answer) but not impossible, and you'll gain the possibility of being able to identify parts of the sentence as submatches of the regular subexpressions embedded in the global one.
If you want to allow nesting, you need to switch to context free languages, which are defined with context free grammars and are accepted with a more complex stack based automaton. Then, you loose the complete set of operations you had:
You'll never be able again to say if some language overlaps another (is included)
You'll never be abla again to construct a language from the union, intersection or difference of other context free languages.
But you will be able to match unbounded nested sentences. Normally, programming languages are defined with a context free grammar and a little more work for context checking (for example, to check if some identifier being used is actually defined in the declaration section or to match the starting and ending tag identifiers at matching levels in an XML document)
For context free languages, see this.
For regular languages, see this.
Second clarification
As in your question you didn't expressed you wanted to match real, decimal numbers, I have modified the demo to make it to allow fixed point numbers (not general floating point with exponential notation, you'll need to work it yourself, as an exercise). Just make some tests and modify the regexp to adapt it to your needs.
(well, if you want to see the solution, look at it)
Yeah i tried using the regex in my code but it is not working so I am trying a different approach now. I have an idea of how to approach it but it is not really working. First of let me be more clear on the question. What I am trying to so parse a JSON document. Like the image below. the file has a strings have [[[1,2],[3,4][5,6]]] pattern. What I am trying to get out of this is to have each pair as a list. So the list has an x-y pairs.
the string structure
My approach: first replace the “[[“ and “]]” at the begging and at the end, so I have a string with the same pattern through out. which gives [enter image description here][2]me a string “[1,2],[3,4][5,6]” This is my code but it is not working. How do I fix it? The other thing I though it could be an issue is, the strings are not the same length so. So how do I replace just the beginning and the ending?
my code
Then I can use a regex split method to get a list that has a form {“1,2” , “3,4”, “5,6”}. I am not really sure how to do this though.
Then I take the x, and the y, and add them and add those to the list. So I get of a list pair x-y pair. I will appreciate if you show me how to do this.
This is the approach I am working on but if there is a better way of doing it I will be glad to see it. [enter image description here][4]

How to leave ordering unchanged on equality using qsort [duplicate]

This question already has answers here:
Stabilizing the standard library qsort?
(3 answers)
Closed 7 years ago.
Straight from the qsort manual it says:
If two members compare as equal, their order in the sorted array is undefined.
However, I want qsort to leave the ordering unchanged on equality. That is to say, leave the ordering of two elements as they are in the original array if they have the same value used for ordering.
I'm not sure how to do this using qsort. I would rather not use an alternative sorting library/ roll my own.
Thanks
--
edit: now I know the distinction between stable and unstable sorts I realise this is a duplicate.
You are essentially asking how to implement stable sorting on top of an unstable sort.
There is not much you can do: you only have control over the comparison function, so you'll have to work on that.
One possibility is to attach a second key ID to each element in the array that corresponds to the original index of that element. Then, if two elements compare equal, just return the corresponding order according to their original indexes.
As the GNU libc manual explains,
The only way to perform a stable sort with qsort is to first augment the objects with a monotonic counter of some kind.
So unless you can modify your comparison function so that objects you currently consider "equal" may be distinguished, you'll have to explicitly annotate your objects with indices before sorting, and remove the indices when you're done. You could use something like
struct indexed {
void *obj;
int index;
}
qsort stands for "Quick Sort", which is a fast but not stable sorting algorithm. Suppose you are sorting in ascending order. In this case, a sorting algorithm is called stable if it won't swap elements that have equal keys. If it may swap elements with equal keys, it is unstable.
There are different solutions to your problem; I'll suggest two of them here.
Solution 1
You may find in Wikipedia some interesting ideas regarding
stability of sorting algorithms, including a method for using unstable algorithms when you actually need a stable one: you extend the sorting key: if you are sorting using an integer value, for example.
Suppose you are sorting an array of integers. Instead of that, you create a structure with two elements: the first (KEY) is your integer. The second (AUX_KEY) is a unique index. Before sorting, you linearly set AUX_KEY to reflect the original order. When sorting, pass to qsort a function that will sort using KEY, but will revert to using AUX_KEY if two keys are equal.
Solution 2
Instead of using Quicksort, use a stable algorithm like merge sort, which you probably have in your programming environment (for example, here's the FreeBSD manpage for mergesort; the same may be available in other programming environments under the same name -- please check) or, if you don't need speed too much, use insertion sort (which you can code yourself -- it takes few lines only in pseudocode). Insertion sort takes has quadratic time complexity (takes time proportional to n^2), while mergesort and quicksort have time complexity O(n log(n)).
What qsort is telling you here is his algorithm doesn't allow it to guarantee you the resulting sorting order of two equals members.
What you can do is implementing / using another algorithm, or force them to sort the way they was :
If your comparing function shall return equality, look at how they were originally sorted, and sort them the same way.

does postgresql array type preserve order of array?

I read over the docs for PostgreSQL v 9.3 arrays (http://www.postgresql.org/docs/9.3/static/arrays.html), but I don't see the question of ordering covered. Can someone confirm that Postgres preserves the insertion order/original order of an array when it's inserted into an array column? This seems to be the case but I would like absolute confirmation.
Thank you.
The documentation is entirely clear that arrays are useful in scenarios where order is important, inasmuch as it explicitly documents querying against specific positions within an array. If those positions were not reliable, these queries would have no meaning. (Using the word "array" is also clear on this point, being as it is a term of the art: An array is an ordered datatype by its nature; an unordered collection allowed to contain duplicates would be a bag, not an array, just as an unordered collection in which duplicates were not allowed would be a set).
See the examples given in section 8.1.4.3, of "pay by quarter", with index position within the array indicating which quarter is being queried against.
Cannot find in documentation, but I'm pretty sure. Yes, order is preserved. And [2,4,5] is different from [5,2,4].
In case I'm wrong, indexes cannot work.

How to efficiently search large dataset for substrings?

I have a large set of short strings. What are some algorithms and indexing strategies for filtering the list on items that contain a substring? For example, suppose I have a list:
val words = List(
"pick",
"prepick",
"picks",
"picking",
"kingly"
...
)
How could I find strings that contain the substring "king"? I could brute force the problem like so:
words.filter(_.indexOf("king") != -1) // yields List("picking", "kingly")
This is only practical for small sets; Today I need to support 10 million strings, with a future goal in the billions. Obviously I need to build an index. What kind of index?
I have looked at using an ngram index stored in MySQL, but I am not sure if this is the best approach. I'm not sure how to optimally query the index when the search string is longer than the ngram size.
I have also considered using Lucene, but this is optimized around token matching, not substring matching, and does not seem to support the requirement of simple substring matching. Lucene does have a few classes related to ngrams (org.apache.lucene.analysis.ngram.NGramTokenFilter is one example), but these seem to be intended for spell check and autocomplete use cases, not substring matching, and the documentation is thin.
What other algorithms and indexing strategies should I consider? Are there any open source libraries that support this? Can the SQL or Lucene strategies (above) be made to work?
Another way to illustrate the requirement is with SQL:
SELECT word FROM words WHERE word LIKE CONCAT('%', ?, '%');
Where ? is a user provided search string, and the result is a list of words that contain the search string.
How big is the longest word?
if that's about 7-8 char you may find all substrings for each and every string and insert that substrings in trie (the one is used in Aho-Corasik - http://en.wikipedia.org/wiki/Aho-Corasick)
It will take some time to build the tree but then searching for all occurances will be O(length(searched word)).
Postgres has a module which does a trigram index
That seems an interesting idea too- building a trigram index.
About a comment in your question regarding how to break down text searches greater than n-gram length:
Here's one approach which will work:
Say we have a search string as "abcde" , and we have built a trigram index. (You have strings which are of smaller lengths-this could hit a sweet spot for you)
Let the search results of abc= S1, bcd=S2,cde=S3 (where S1,S2,S3 are sets of indexes )
Then the longest common substring of S1,S2,S3 will give the indexes that we want.
We can transform each set of indexes,as a single string separated by a delimiter (say space) before doing LCS.
After we find the LCS,we would have to search the indexes for the complete pattern,since we have broken down the search term. ie we would have to prune results which have "abc-XYZ- bcd-HJI-def"
The LCS of a set of strings can be efficiently found Suffix Arrays. or Suffix trees

The C programming language 2. ed. question

I'm reading the well known book "The C programming Language, 2nd edition" and there is one exercise that I'm stuck with. I can't figure out what exactly needs to be done, so I would like for someone to explain it to me.
It's Exercise 5-17:
Add a field-searching capability, so sorting may be done on fields within lines, each field sorted according to an independent set of options.
What does the input program expect from the command line; what does it mean by "independent set of options"?
Study the POSIX sort utility, ignoring the legacy options. Or study the GNU sort program; it has even more options than POSIX sort does.
You need to decide between fixed-width fields as suggested by Neil Butterworth in his answer and variable-width fields. You need to decide on what character separates variable-width fields. You need to decide on which sorting modes to support for each field (string, case-folded string, phone-book string, integer, floating point, date, etc) as well as sort direction (forward/reverse or ascending/descending).
The 'independent options' means that you can have different sort criteria for different fields. That is, you can arrange for field 1 to be sorted in ascending string order, field 3 to be sorted in descending integer order, and field 9 to be sorted in ascending date order.
Note that when sorting, the primary criterion is the first key field specified. When two rows are compared, if there is a difference between the first key field in the two rows, then the subsequent key fields are never considered. When two rows are the same in the first key field, then the criterion for the second key field determines the relative order; then, if the second key fields are the same, the third key field is consulted, and so on. If there are no more key fields specified, then the usual default sort criterion is "the whole line of input in ascending string order". A stable sort preserves the relative order of two rows in the original data that are the same when compared using the key field criteria (instead of using the default, whole-line comparison).
It's referring to the ability to specify subfields in each row to sort by. For example:
sort -f1:4a -f20:28d somefile.txt
would sort the field beginning at character position 1 and extending to position4 ascending and within that sort the field beginning at position 20 and extending to 28 descending.
Of course, there are lots of other ways to specify fields, sort order etc. Designing the command line switches is one of the points of the exercise, IMHO.

Resources