Additional space after the end of a string in MS-SQL. Trimming does nothing [duplicate] - sql-server

This question already has answers here:
How can I make SQL Server return FALSE for comparing varchars with and without trailing spaces?
(6 answers)
Why would SqlServer select statement select rows which match and rows which match and have trailing spaces
(4 answers)
Closed 5 months ago.
What's this? The last char is a space (ascii 32), but SQL says the trimmed version is exactly the same as the not trimmed one. Moreover, it turns out that the length of the keyword corresponds to the trimmed length of the keyword and that the space comes after what SQL considers to be the last character.
Note that there are only these 143 results in a table of billions (>1,000,000,000)
select k.Keyword, len(k.Keyword) LenKeyword, trim(k.Keyword) TrimKeyword,
case when k.Keyword=trim(k.Keyword) then 1 else 0 end isSame,
len(trim(k.Keyword)) LenTrimKeyword, SUBSTRING(k.Keyword,len(k.keyword)+1,1) LastCharPlus1
, ASCII (SUBSTRING(k.Keyword,len(k.keyword)+1,1)) AsciiLastCharPlus1
from #tk AdditionalSpaceAfterEndOfString
inner join SE_Keywords k
on AdditionalSpaceAfterEndOfString.Keyword=k.Keyword and AdditionalSpaceAfterEndOfString.Lang=k.Lang and AdditionalSpaceAfterEndOfString.Country=k.Country
order by k.DateAdded desc
Keyword LenKeyword TrimKeyword isSame LenTrimKeyword LastCharPlus1 AsciiLastCharPlus1
------------------------------------------------------------------------------------------------------------------------------------------------------ ----------- ------------------------------------------------------------------------------------------------------------------------------------------------------ ----------- -------------- ------------- ------------------
bio arganöl 11 bio arganöl 1 11 32
assurance retrait de permis 27 assurance retrait de permis 1 27 32
call center, centre d'appel, algerie, alger, fran?ais, francophone, ogs, ogs, ogsolution, 89 call center, centre d'appel, algerie, alger, fran?ais, francophone, ogs, ogs, ogsolution, 1 89 32
esta 4 esta 1 4 32
google périsprit 16 google périsprit 1 16 32
huizen aanbod den bosch 23 huizen aanbod den bosch 1 23 32
recruiting pflege 17 recruiting pflege 1 17 32
test by keywords 17 test by keywords 1 17 32
employer branding pflege 24 employer branding pflege 1 24 32
employer branding pflege 24 employer branding pflege 1 24 32
lepelboom 9 lepelboom 1 9 32
sun plaisance 13 sun plaisance 1 13 32
keyboost 4 10 keyboost 4 1 10 32
morocco desert tours 20 morocco desert tours 1 20 32
vraag aanbod reclame 20 vraag aanbod reclame 1 20 32
bedrijfskleding drukwerk 24 bedrijfskleding drukwerk 1 24 32
bedrijfstuitje zeilen 21 bedrijfstuitje zeilen 1 21 32
bmw occasion 12 bmw occasion 1 12 32
marketing altenheim 19 marketing altenheim 1 19 32
marketing pflegeeinrichtung 27 marketing pflegeeinrichtung 1 27 32
marketing prijsbeker 20 marketing prijsbeker 1 20 32
... 1 41 32
abonnement culture paris 24 abonnement culture paris 1 24 32
kommunikationsagentur pflege 28 kommunikationsagentur pflege 1 28 32
personalmarketing pflege 24 personalmarketing pflege 1 24 32
personalmarketing pflege 24 personalmarketing pflege 1 24 32
salon de beauté pavillons-sous-bois 35 salon de beauté pavillons-sous-bois 1 35 32
sophos safeguard 16 sophos safeguard 1 16 32
bon achat spectacles fnac 25 bon achat spectacles fnac 1 25 32
catering alphen-aan-den-rijn 28 catering alphen-aan-den-rijn 1 28 32
drone electronics 17 drone electronics 1 17 32
pflege marketing 16 pflege marketing 1 16 32
pop! vinyl 10 pop! vinyl 1 10 32
prêt sans banque 16 prêt sans banque 1 16 32
sebastien izambard 18 sebastien izambard 1 18 32
carte scenes et sorties 23 carte scenes et sorties 1 23 32
dongen nieuws delen 19 dongen nieuws delen 1 19 32
newerkkabels 12 newerkkabels 1 12 32
quad and camel in marrakech 27 quad and camel in marrakech 1 27 32
showroom marques de luxe 24 showroom marques de luxe 1 24 32
Trimming does not help, the only thing is looking if the char after the total of the string is a space:
select Keyword, Lang, Country into #tk from se_keywords where ascii (SUBSTRING(Keyword,len(keyword)+1,1)) is not null

Related

Aggregate function with window function filtered by time

I have a table with data about buses while making their routes. There are columns for:
bus trip id (different each time a bus starts the route from the first stop)
bus stop id
datetime column that indicates the moment that the bus leaves each bus stop
integer that indicates how many passengers entered the bus in that stop
There is no information about how many passengers get off the bus on each stop, so I have to make an estimation supposing that once they get on the bus, they stay on it for 30 minutes. The trip lasts about 70 minutes from the first to the last stop.
I am trying to aggregate results on each stop using
SUM(iPassengersIn) OVER (
PARTITION BY tripDate, tripId
ORDER BY busStopOrder
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) total_passengers
The problem is that I can add passengers since the beginning of the trip, but not since "30 minutes ago" on each stop. How could I limit the aggregation to "the last 30 minutes" on each row in order to estimate the occupation between stops?
This is a subset of my data:
trip_date trip_id bus_stop_order minutes_since_trip_start passengers_in trip_total_passengers
2020-06-08 374910 0 0 0 0
2020-06-08 374910 1 3 0 0
2020-06-08 374910 2 5 1 1
2020-06-08 374910 3 8 0 1
2020-06-08 374910 4 9 0 1
2020-06-08 374910 5 12 0 1
2020-06-08 374910 6 13 0 1
2020-06-08 374910 7 13 0 1
2020-06-08 374910 8 15 0 1
2020-06-08 374910 9 16 0 1
2020-06-08 374910 10 16 0 1
2020-06-08 374910 11 17 0 1
2020-06-08 374910 12 18 2 3
2020-06-08 374910 13 20 0 3
2020-06-08 374910 14 22 0 3
2020-06-08 374910 15 24 0 3
2020-06-08 374910 16 25 0 3
2020-06-08 374910 17 28 2 5
2020-06-08 374910 18 30 1 6
2020-06-08 374910 19 31 0 6
2020-06-08 374910 20 33 0 6
2020-06-08 374910 21 41 3 9
2020-06-08 374910 22 44 3 12
2020-06-08 374910 23 45 4 16
2020-06-08 374910 24 48 2 18
2020-06-08 374910 25 48 2 20
2020-06-08 374910 26 50 0 20
2020-06-08 374910 27 51 0 20
2020-06-08 374910 28 51 0 20
2020-06-08 374910 29 53 0 20
2020-06-08 374910 30 55 0 20
2020-06-08 374910 31 58 0 20
For the row with bus_stop_order 21 (41 minutes into the bus trip), where 3 passengers enter the bus, I have to sum only the passengers that entered the bus between minute 11 and 41. Thus, the passenger that entered the bus in the 2nd bus stop (5 minutes into the trip) should be excluded.
That should be applied for every row.
The only thing I can think of is:
select
trip_date,
trip_id,
minutes_since_trip_start,
v.total_passengers
from
#t t1
outer apply (
select sum(passengers_in)
from #t t2
where
t1.trip_date = t2.trip_date
and t1.trip_id = t2.trip_id
and t2.bus_stop_order <= t1.bus_stop_order
and t2.minutes_since_trip_start >= t1.minutes_since_trip_start - 30
) v(total_passengers)
order by
trip_date,
trip_id,
minutes_since_trip_start
;

How do I perform shell sort using sequence {3,2,1}?

Suppose I have an array:
30 20 29 19 28 18 27 17 26 16 25 15 24 14 23 13 22 12 21 11
I am not understanding how to do the shell sort using sequence 3:
Would I simply do this:
30 20 29 19 28 18 27 17 | 26 16 25 15 24 14 |23 13 22 12 21 11
Where I split it into 3 parts and sort the respective parts? And then do the same with 2 sorting after, except split into halves? What is the right way to do this? Can someone please explain?
If you look at your array and number the locations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
30 20 29 19 28 18 27 17 26 16 25 15 24 14 23 13 22 12 21 11
In a shell sort, what you do is start with a skip number (in your case 3) so to make the first "list" you take a number and skip. With 3 this would be 1st, 4th, 7th etc.
So you would have a list of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
30 19 27 16 24 13 21
and a second list of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
20 28 17 25 14 22 11
The 3rd list is the remaining items.
For the next round you do it with one less... so items at odd number locations and items at even number locations.
In response to comment below
A Shell sort is an in-place sort — that means you don't remove the items to new lists or create any new data structures. You are using the array to "treat" items which are "far apart" in terms of array locations as next to each other. You don't actually make new lists or new arrays (that is why I showed my diagrams as I did); you just look at these locations.
Why?
Because it means that when you start (for example with 3) you are moving stuff farther) -- eg the 13 that starts at location 16 gets moved to location 1 in the first pass. Then as you reduce the number you start doing more local changes. This means you gain an advantage over a typical bubble sort. Still not good — but MUCH better than a bubble sort.

Data field shifting through a vector of data in matlab

I need to create a data field that will go through a vector. Data field is constant length, and it is going through the data vector shifting data field with data field length. I need the mean value of that field (A vector) that corresponds to a mean value of another field (B vector).
Example:
A=[1 5 7 8 9 10 11 13 15 18 19 25 28 30 35 40 45 48 50 51];
B=[2 4 8 9 12 15 16 18 19 20 25 27 30 35 39 40 45 48 50 55];
I want to do next:
A=[{1 5 7 8 9} 10 11 13 15 18 19 25 28 30 35 40 45 48 50 51];
B=[{2 4 8 9 12} 15 16 18 19 20 25 27 30 35 39 40 45 48 50 55];
I want to take data from field of 5 points and get mean value. And then shift whole data field with data field length.
A=[1 5 7 8 9 {10 11 13 15 18} 19 25 28 30 35 40 45 48 50 51];
B=[2 4 8 9 12 {15 16 18 19 20} 25 27 30 35 39 40 45 48 50 55];
I need two vectors, C and D with mean values of this method.
C=[6 13.4 27.4 45.2];
D=[7 17.6 31.2 47.6];
I started something with
n = length(A);
for k = 1:n
....
but nothing I tried worked.
reshape the vector into a 5-row matrix and then compute the mean of each column:
C = mean(reshape(A,5,[]),1);
D = mean(reshape(B,5,[]),1)

XOR File Decryption

So I have to decrypt a .txt file that is crypted with XOR code and with a repeated password that is unknown, and the goal is to discover the message.
Here are the things that I already know because of the professor:
First I need to find the length of the unknown password
The message has been altered and it doesn't have spaces (this may add a bit more difficulty because the space character has the highest frequency in a message)
Any ideas on how to solve this?
thx in advanced :)
First you need to find out the length of the password. You do this by assessing the Index of Coincidence or Kappa-test. XOR the ciphertext with itself shifted 1 step and count the number of characters that are the same (value 0). You get the Kappa value by dividing the result with the total number of characters minus 1. Shift one more time and again calculate the Kappa value. Shift the ciphertext as many times as needed until you discover the password length. If the length is 4 you should see something similar to this:
Offset Hits
-------------------------
1 2.68695%
2 2.36399%
3 3.79009%
4 6.74012%
5 3.6953%
6 1.81582%
7 3.82744%
8 6.03504%
9 3.60273%
10 1.98052%
11 3.83241%
12 6.5627%
As you see the Kappa value is significantly higher on multiples of 4 (4, 8 and 12) than the others. This suggests that the length of the password is 4.
Now that you have the password length you should again XOR the cipher text with itself but now you shift by multiples of the length. Why? Since the ciphertext looks like this:
THISISTHEPLAINTEXT <- Plaintext
PASSPASSPASSPASSPA <- Password
------------------
EJKELDOSOSKDOWQLAG <- Ciphertext
When two values which are the same are XOR:ed the result is 0:
EJKELDOSOSKDOWQLAG <- Ciphertext
EJKELDOSOSKDOWQLAG <- Ciphertext shifted 4.
Is in reality:
THISISTHEPLAINTEXT <- Plaintext
PASSPASSPASSPASSPA <- Password
THISISTHEPLAINTEXT <- Plaintext
PASSPASSPASSPASSPA <- Password
Which is:
THISISTHEPLAINTEXT <- Plaintext
THISISTHEPLAINTEXT <- Plaintext
As you see the password "disappears" and the plaintext is XOR:ed with itself.
So what can we do now then? You wrote that the spaces are removed. This makes it a bit harder to get the plaintext or password. But not at all impossible.
The following table shows the ciphertext values for all english characters:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A 0
B 3 0
C 2 1 0
D 5 6 7 0
E 4 7 6 1 0
F 7 4 5 2 3 0
G 6 5 4 3 2 1 0
H 9 10 11 12 13 14 15 0
I 8 11 10 13 12 15 14 1 0
J 11 8 9 14 15 12 13 2 3 0
K 10 9 8 15 14 13 12 3 2 1 0
L 13 14 15 8 9 10 11 4 5 6 7 0
M 12 15 14 9 8 11 10 5 4 7 6 1 0
N 15 12 13 10 11 8 9 6 7 4 5 2 3 0
O 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
P 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0
Q 16 19 18 21 20 23 22 25 24 27 26 29 28 31 30 1 0
R 19 16 17 22 23 20 21 26 27 24 25 30 31 28 29 2 3 0
S 18 17 16 23 22 21 20 27 26 25 24 31 30 29 28 3 2 1 0
T 21 22 23 16 17 18 19 28 29 30 31 24 25 26 27 4 5 6 7 0
U 20 23 22 17 16 19 18 29 28 31 30 25 24 27 26 5 4 7 6 1 0
V 23 20 21 18 19 16 17 30 31 28 29 26 27 24 25 6 7 4 5 2 3 0
W 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24 7 6 5 4 3 2 1 0
X 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0
Y 24 27 26 29 28 31 30 17 16 19 18 21 20 23 22 9 8 11 10 13 12 15 14 1 0
Z 27 24 25 30 31 28 29 18 19 16 17 22 23 20 21 10 11 8 9 14 15 12 13 2 3 0
What does this mean then? If an A and a B is XOR:ed then the resulting value is 3. E and P will result in 21. Etc. OK but how will this help you?
Remember that the plaintext is XOR:ed with itself shifted by multiples of the password length. For each value you can check the above table and determine what combinations that position could have. Lets say the value is 25 then the two characters that resulted in the value 25 could be one of the following combinations:(I-P), (H-Q), (K-R), (J-S), (M-T), (L-U), (O-V), (N-W), (A-X) or (C-Z). But which one? Now you do more shifts and look up the corresponding values in the table again for each position. Next time the value might be 7 and since you already have a list of possible character combinations you only check against them. At the next two shifts the values are 3 and 1. Now you can determine that the character is W since that is the only common character in each shift, (N-W), (P-W), (T-W), (V-W). You can do this for most positions.
You will not get all the plaintext but you will get enough characters to discover the password. Take the known characters and XOR them in the correct position in the ciphertext. This will yield the password. The number of known characters you need atleast is the number of characters in the password if they are at the "correct" positions in regards to the password.
Good luck!
you should look at cracking a vigenere chiffre, especially at auto-correlation. The latter will help you finding out the length of the password and the rest is usually just bruteforcing on the normal distribution of letters (where the most common one is the letter e in the english language).
Although spaces are the most common characters and make decryptions like this easy, the other character also have different frequencies. For example, see this Wikipedia article. If you've got enough encrypted text and the password length isn't too large, it might just be enough to find out the most common bytes in the encrypted text. They will most likely be the encrypted versions of e that has the highest frequency in english texts.
This alone won't give you the decrypted text, but it's very likely you can find out the password length and (part of) the password itself with it. For example, let's assume the most frequent encrypted bytes are
w x m z y
with almost the same frequency and there's a significant drop in frequency after the last one. This will tell you two things:
The password length most likely is 5, because statistically, all encrypted e will be equally likely. EDIT: OK, this isn't correct, it will be 5 or above because the password can contain the same character multiple times.
The password will be some permutation of (w x m z y XOR e e e e e) - you can use the byte offsets modulo the password length to get the correct permutation.
EDIT: The same character occuring in the password multiple times makes things a bit harder, but you'll most likely be able to identify those because as I said, encrypted versions of e will cluster around frequency f - now if the character occurs n times, it will have a frequency near n*f.
The most common three letter trigram in English (assuming the language is probably English) is "the". Place "the" at all possible points on your cyphertext to derive a possible 3 characters of the key. Try each possible key fragment at all other possible positions on the cyphertext and see what you get. For example, "qzg" is unlikely to be correct, but "fen" could be. Look at the spacing between possible positions to derive the key length. With a key length and a key fragment you can place a lot more of the key.
As Lars said, look at ways of decrypting Vigenère, which is effectively what you have here.

need hint with a custom Linux/UNIX command line utlity "cal" in C

Ok I need to make this program to display "cal" 3 month(one month before and one month after) side by side, rather than just one single month it displays in any Linux/UNIX. I got it working to display 3 calendar by using "system(customCommand)" three times; but then it's not side by side.
I got some hint to use the following system calls:
close(..) pipe(..) dup2(..) read(..) and write(..)
my question is what should I start with? Do I need to create child process and than catch it in pipe(..)?
How can I display three calendar side by side.
ex.
February 2009 March 2009 April 2009
S M Tu W Th F S S M Tu W Th F S S M Tu W Th F S
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4
8 9 10 11 12 13 14 8 9 10 11 12 13 14 5 6 7 8 9 10 11
15 16 17 18 19 20 21 15 16 17 18 19 20 21 12 13 14 15 16 17 18
22 23 24 25 26 27 28 22 23 24 25 26 27 28 19 20 21 22 23 24 25
29 30 31 26 27 28 29 30
Assuming you want to write it yourself instead of using "cal -3", what I'd do (in psuedo code):
popen three calls to "cal" with the appropriate args
while (at least one of the three pipes hasn't hit EOF yet)
{
read a line from the first if it isn't at EOF
pad the results out to a width W, print it
read a line from the second if it isn't at EOF
pad the results out to a width W, print it
read a line from the third if it isn't at EOF
print it
print "\n"
}
pclose all three.
if "cal -3" doesn't work, just use paste :)
$ TERM=linux setterm -regtabs 24
$ paste <(cal 2 2009) <(cal 3 2009) <(cal 4 2009)
febbraio 2009 marzo 2009 aprile 2009
do lu ma me gi ve sa do lu ma me gi ve sa do lu ma me gi ve sa
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4
8 9 10 11 12 13 14 8 9 10 11 12 13 14 5 6 7 8 9 10 11
15 16 17 18 19 20 21 15 16 17 18 19 20 21 12 13 14 15 16 17 18
22 23 24 25 26 27 28 22 23 24 25 26 27 28 19 20 21 22 23 24 25
29 30 31 26 27 28 29 30
$
(setterm ignores -regtabs unless TERM=linux or TERM=con.)
just do
cal -3
Does this not work?
cal -3
Ok, how about cal -3?
cal -3 12 2120 to make it a special month and year, with one before and one after.
The approach I would use for this would be to capture the output, split it into lines, and printf the lines out next to each other. I'd probably do it in Perl, though, rather than C.
Or just use cal -3, if your cal has it.

Resources