Log Parser: HAVING Wildcards - logparser

I have a log parser query that gets the top 200 uris, however I don't want any cs-uri-stem entries that have a dot (.) in them.
This is as close as I've come, but it seems like the wildcards are not acting as I expected:
"SELECT TOP 200 cs-uri-stem, COUNT(*) AS Total INTO \Top200URIs_NoDots.csv
FROM "\2015-01\U*.log"
GROUP BY cs-uri-stem
HAVING cs-uri-stem NOT LIKE '%.%'
ORDER BY Total DESC"
When I run this I get an Error:
... HAVING cs-uri-stem NOT LIKE ''...
Error: Syntax Error: <having-clause>: not a valid <expression>
Why is it ignoring the '%'s and everything between?

HAVING is for filtering group results using aggregate functions on the grouped data. Filtering on the grouped data is more processing-intensive because the grouping must be completed first. In this case, your query will be more optimally performed using a WHERE clause anyway. Also, remember to use %% if this is in a batch file. A single % denotes a batch variable and won't make it to the program's arguments.

Related

Snowflake:Pattern date search read from S3

Requirement: Need to fetch only the latest file everyday example here its 20200902 file
Example Files in S3:
#stagename/2020/09/reporting_2020_09_20200902000335.gz
#stagename/2020/09/reporting_2020_09_20200901000027.gz
Code:
select distinct metadata$filename from
#stagename/2020/09/
(file_format=>APP_SKIP_HEADER,pattern=>'.*/reporting_*20200902*.gz');
This will work no matter what the naming conventions of the files. Since your files appear to have a naming convention based on date and are one per point in time, you may not need to use the date to do this as you could use the name. You'll still want to use the result_scan approach.
I haven't found a way to get the date for a file in a stage other than using the LIST command. The docs say that FILE_NAME and FILE_ROW_NUMBER are the only available metadata in a select query. In any case, that approach reads the data, and we only want to read the metadata.
Since a LIST command is a metadata query, you'll need to query the result_scan to use a where clause.
One final issue that I ran into while working on a project: the last_modified date in the LIST command is in format that requires a somewhat long conversion expression to convert to timestamp. I made a UDF to do the conversion so that it's more readable. If you'd prefer putting the expression directly in the SQL, that's fine too.
First, create the UDF.
create or replace function LAST_MODIFIED_TO_TIMESTAMP(LAST_MODIFIED string)
returns timestamp_tz
as
$$
to_timestamp_tz(left(LAST_MODIFIED, len(LAST_MODIFIED) - 4) || ' ' || '00:00', 'DY, DD MON YYYY HH:MI:SS TZH:TZM')
$$;
Next, list the files in your stage or subdirectory of the stage.
list #stagename/2020/09/
Before running any other query in the session, run this one on the last query ID. You can of course run it any time in 24 hours if you specify the query ID explicitly.
select "name",
"size",
"md5",
"last_modified",
last_modified_to_timestamp("last_modified") LAST_MOD
from table(result_scan(last_query_id()))
order by LAST_MOD desc
limit 1

Using a variable within the FROM section of a Log Parser Lizard IIS log query

I'm trying to speed up my Log Parser Lizard queries to IIS logs on one of our servers.
This kind of query works but it's very slow:
SELECT TOP 100 * FROM '\\myserver\c$\inetpub\logs\LogFiles\W3SVC1\u_ex*.log'
ORDER BY time DESC
If I specify today's log filename that's a lot quicker:
SELECT TOP 100 * FROM '\\myserver\c$\inetpub\logs\LogFiles\W3SVC1\u_ex190731.log'
ORDER BY time DESC
I'm trying to find a way to achieve this without having to keep changing the filename within the query to match today's date. I can't find any way of using variables or functions like strcat within the FROM section of the query.
So in simpler terms, is there any way to inject today's date into a query like this:
SELECT * FROM 'C:\test\%DATE%.txt'
I found that LogParserLizard supports inline VB.NET code, specified using <% ... %> tags, so it was just a matter of inserting the date using that syntax and it worked fine.
SELECT TOP 100 *
FROM '\\myserver\c$\inetpub\logs\LogFiles\W3SVC1\u_ex<% return DateTime.Now.ToString("yyMMdd") %>.log'
ORDER BY time DESC
or in my simplified version it would be:
SELECT * FROM 'C:\test\<% return DateTime.Now.ToString("yyMMdd") %>.txt'
--converts to `C:\test\190731.txt`

How to do GROUP BY with COUNT() and ordering by COUNT in influxdb?

Im using influxDb and recording user visits to a dictionary pages and trying to get some queries working.
Like for example I'm trying to find out how to get a sorted set of headwords sorted by a number of visits to a particular word definition within some timeframe. So basically words sorted by a number of visits.
Im trying something like this:
SELECT COUNT(*) FROM lookup GROUP BY word ORDER BY count_value LIMIT 100
^But it doesn't work. Error message is "Server returned error: error parsing query: only ORDER BY time supported at this time".
Is what im trying to do not achievable in influxDb?
As noted by the error that was returned
Server returned error: error parsing query: only ORDER BY time supported at this time
InfluxDB only supports ORDER BY time at the moment. To achieve the result that you're looking for you'd need to do the ORDER BY client side.

How to set NOT clause on Date range query in Solr

Have been trying to understand this for a while ...
How can I specify NOT clause in the following query?
{!field f=schedule op=Intersects}[2016-08-26T12:30:00Z TO 2016-08-26T18:30:00Z]
{!field f=schedule op=Contains}[2016-08-26T12:30:00Z TO 2016-08-26T18:30:00Z]
Like, without LocalParams, we can specify -DateField:[2016-08-26T12:30:00Z TO 2016-08-26T18:30:00Z] to get an equivalent NOT clause. But, I need a NOT Contains Date Range query.
I have tried a few options but I end up getting parsing errors. Surely there must be some obvious way I am missing.

SQL Full Text Indexer, exact matches and escaping

I'm trying to replace a Keyword Analyser based Lucene.NET index with an SQL Server 2008 R2 based one.
I have a table that contains custom indexed fields that I need to query upon. The value of the index column (see below) is a combination of name/ value pairs of the custom index fields from a series of .NET types - the actual values are pulled from attributes at run time, because the structure is unknown.
I need to be able to search for set name and value pairs, using ANDs and ORs and return the rows where the query matches.
Id Index
====================================================================
1 [Descriptor.Type]=[5][Descriptor.Url]=[/]
2 [Descriptor.Type]=[23][Descriptor.Url]=[/test]
3 [Descriptor.Type]=[25][Descriptor.Alternative]=[hello]
4 [Descriptor.Type]=[26][Descriptor.Alternative]=[hello][Descriptor.FriendlyName]=[this is a test]
A simple query look like this:
select * from Indices where contains ([Index], '[Descriptor.Url]=[/]');
That query will results in the following error:
Msg 7630, Level 15, State 2, Line 1
Syntax error near '[' in the full-text search condition '[Descriptor.Url]=[/]'.
So with that in mind, I altered the data in the Index column to use | instead of [ and ]:
select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');
Now, while that query is now valid, when I run it all rows containing Descriptor.Url and starting with / are returned, instead of the records (exactly one in this case) that exactly matches.
My question is, how can I escape the query to account for the [ and ] and ensure that just the exact matching row is returned?
A more complex query looks a little like this:
select * from Indices where contains ([Index], '[Descriptor.Type]=[12] AND ([Descriptor.Url]=[/] OR [Descriptor.Url]=[/test])');
Thanks,
Kieron
Your main issue is in using a SQL wordbreaker, and the CONTAINS syntax. By default, SQL wordbreakers eliminates punctuation, and normalizes numbers, dates, urls, email addresses, and the like. It also lowercases everything, and stems words.
So, for your input string:
[Descriptor.Type]=[5][Descriptor.Url]=[/]
You would have the following tokens added to the index (along with their positions)
descriptor type nn5 5 descriptor url
(Note: the nn5 is a way to simplify quering numbers and dates given in different formats, the original number is also indexed at the same position)
So, as you can see, the punctutation is not even stored in the full text index, and thus, there is no way to query it using the CONTAINS statement.
So your statement:
select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');
Would actually be normalized down to "descriptor url" by the query generator before submitting it to the full text index, thus the hits on all the entries that have "descriptor" next to "url", excluding punctuation.
What you need is the LIKE statement.
Using "|" as your delimiter causes the contains query to think of OR. Which is why you are getting unexpected results. You should be able to escape the bracket like so:
SELECT * FROM Indices WHERE
contains ([Index], '[[]Descriptor.Type]=[[]12]')

Resources