LogParser - Event log data value substitution - logparser

I'm not much of a scripting wiz and I have a small requirement to analyse the Windows security Event logs for firewall traffic.
To that end I've started looking at LogParser and it seems to do pretty much everything I need, but I'm having a little trouble working out how to substitute certain values extracted from the logs, into something more readable.
My script is very simple:
SELECT
TimeGenerated AS Time,
EventTypeName AS Event,
EXTRACT_TOKEN(Strings, 0,'|') AS ProcessID,
EXTRACT_TOKEN(Strings, 1,'|') AS Process,
EXTRACT_TOKEN(Strings, 7,'|') AS Protocol,
EXTRACT_TOKEN(Strings, 2,'|') AS Direction,
EXTRACT_TOKEN(Strings, 3,'|') AS SourceAddress,
EXTRACT_TOKEN(Strings, 4,'|') AS SourcePort,
EXTRACT_TOKEN(Strings, 5,'|') AS DestinationAddress,
EXTRACT_TOKEN(Strings, 6,'|') AS DestinationPort
FROM Security
WHERE EventID IN (5152; 5153; 5154; 5155; 5156; 5157; 5158)
Although this produces the information I'm interested in, I'd like, if possible, to change the output. For exampleThe 'Process' column output is:
\device\harddiskvolume2\apps\mozilla\fx-4\firefox.exe
What I'd really like is to just display the process name, without the path. Likewise the 'Protocol' column just displays the numeric protocol value. I prefer to have it display the 'actual' protocol.
Lastly, the Direction column displays a numerical value %%14592 and %%14593 and I'd prefer to see In and Out respectively.
If anyone can help, I'd be most grateful.
Thanks

For your filename question, does the
EXTRACT_FILENAME(EXTRACT_TOKEN(Strings, 1,'|')) AS Process
work for you?
For your other other issue, how about:
CASE EXTRACT_TOKEN(Strings, 2,'|') WHEN '%%14592' THEN 'IN' ELSE 'OUT' END As Direction

Related

How to Implement Patterns to Match Brute Force Login and Port Scanning Attacks using Flink CEP

I have a use case where a large no of logs will be consumed to the apache flink CEP. My use case is to find the brute force attack and port scanning attack. The challenge here is that while in ordinary CEP we compare the value against a constant like "event" = login. In this case the Criteria is different as in the case of brute force attack we have the criteria as follows.
username is constant and event="login failure" (Delimiter the event happens 5 times within 5 minutes).
It means the logs with the login failure event is received for the same username 5 times within 5 minutes
And for port Scanning we have the following criteira.
ip address is constant and dest port is variable (Delimiter is the event happens 10 times within 1 minute). It means the logs with constant ip address is received for the 10 different ports within 1 minute.
With Flink, when you want to process the events for something like one username or one ip address in isolation, the way to do this is to partition the stream by a key, using keyBy(). The training materials in the Flink docs have a section on Keyed Streams that explains this part of the DataStream API in more detail. keyBy() is the roughly same concept as a GROUP BY in SQL, if that helps.
With CEP, if you first key the stream, then the pattern will be matched separately for each distinct value of the key, which is what you want.
However, rather than CEP, I would instead recommend Flink SQL, perhaps in combination with MATCH_RECOGNIZE, for this use case. MATCH_RECOGNIZE is a higher-level API, built on top of CEP, and it's easier to work with. In combination with SQL, the result is quite powerful.
You'll find some Flink SQL training materials and examples (including examples that use MATCH_RECOGNIZE) in Ververica's github account.
Update
To be clear, I wouldn't use MATCH_RECOGNIZE for these specific rules; neither it nor CEP is needed for this use case. I mentioned it in case you have other rules where it would be helpful. (My reason for not recommending CEP in this case is that implementing the distinct constraint might be messy.)
For example, for the port scanning case you can do something like this:
SELECT e1.ip, COUNT(DISTINCT e2.port)
FROM events e1, events e2
WHERE e1.ip = e2.ip AND timestampDiff(MINUTE, e1.ts, e2.ts) < 1
GROUP BY e1.ip HAVING COUNT(DISTINCT e2.port) >= 10;
The login case is similar, but easier.
Note that when working with streaming SQL, you should give some thought to state retention.
Further update
This query is likely to return a given IP address many times, but it's not desirable to generate multiple alerts.
This could be handled by inserting matching IP addresses into an Alert table, and only generate alerts for IPs that aren't already there.
Or the output of the SQL query could be processed by a de-duplicator implemented using the DataStream API, similar to the example in the Flink docs. If you only want to suppress duplicate alerts for some period of time, use a KeyedProcessFunction instead of a RichFlatMapFunction, and use a Timer to clear the state when it's time to re-enable alerts for a given IP.
Yet another update (concerning CEP and distinctness)
Implementing this with CEP should be possible. You'll want to key the stream by the IP address, and have a pattern that has to match within one minute.
The pattern can be roughly like this:
Pattern<Event, ?> pattern = Pattern
.<Event>begin("distinctPorts")
.where(iterative condition 1)
.oneOrMore()
.followedBy("end")
.where(iterative condition 2)
.within(1 minute)
The first iterative condition returns true if the event being added to the pattern has a distinct port from all of the previously matching events. Somewhat similar to the example here, in the docs.
The second iterative condition returns true if size("distinctPorts") >= 9 and this event also has yet another distinct port.
See this Flink Forward talk (youtube video) for a somewhat similar example at the end of the talk.
If you try this and get stuck, please ask a new question, showing us what you've tried and where you're stuck.

How can I get Watson to recognize two different dates upon user input?

If a user asks the following sentence:
For some reason Watson uses the first date for the both $checkin and $checkout variables even though it detects the second date.
You can refer to the "dialog node" screenshot to see how the nodes are setup.
How can I get Watson to recognize the first date is the checkin date and the second one is the checkout date. Is there a way I could tell Watson after the first date is used if a second one is detected use it to fill the next slot?
I've found something about the #sys-date range_link entity. But the documentation is not detailed.
This is easy to do, but comes with issues you need to be aware of.
Slots allows you to define variables as they are read. For example.
Will generate this:
The issue is that you are assuming that people will ask in the same order that you need the information. If this doesn't happen then this will fail.
You can mitigate this by shaping how the user may ask the question. For example:
"Please let me know where you are leaving and going to"
The person is more likely to respond with the exit date first.
BETA
This is likely to change and doesn't fully work as you expect. You can enable the beta #sys-date in the options. So I wouldn't recommend relying on this until it is final.
You first need to check for range_link. This will tell you if it detected that two dates are connected to each other.
Then you can do the following:
From Date: <? entities['sys-date'].filter("d", "d.role.type == 'date_from'")[0]?.value ?>
To Date: <? entities['sys-date'].filter("d", "d.role.type == 'date_to'")[0]?.value ?>
What this does is find the exact record that has the role of date_from and returns the value. Likewise for date_to.
You end up with something like this.

Adding Geo-IP data to Query in Access SQL

Ok this sounds like a problem that someone else should have solved already, but I can't find any help on it, just that their should be a better way of doing it than using an unequal join.
I have a log file of session info with a Source IP, and I am trying to create a query, that actually runs, to combine the Log file with Geo-IP data to tell the DB where users are connecting from. my first attempt came to this:-
SELECT coco, region, city
FROM GT_Geo_IP
WHERE (IP_Start <= [IntIP] AND IP_End >=[IntIP])
ORDER BY IP_Start;
it seemed to run quite quick and returns the correct record for a given IP. but when I tried to combine it with the log data, like this:-
SELECT T.IP,G.coco,G.region, g.city
FROM GT_Geo_IP as G, Log_Table as T
WHERE G.IP_Start <= T.IntIP AND G.IP_End >= T.IntIP
ORDER BY T.IP;
it locks access for over 45 mins (pegging one of my cpu cores) before i finally decide i need some CPU back or i should actually have a go at something else. From hunting around and this is actually slower than i realize, I found this article and indexed both IP_Start and IP_End to optimize the Search, and based on it came up with this:-
SELECT TOP 1 coco, region, city
FROM GT_Geo_IP
WHERE G.IP_Start >= [IntIP]
ORDER BY G.IP_Start;
But with my SQL skills i cant work out how to combine it with my log data.
Basically the question is how do i use the better method with my log data to get the required result? or is there a better way to do it?
The GeoIP data is from IP2Location's LITE-DB3
I have thought about nested queries, but i couldn't work out how to construct it, I thought about using VBA, but i'm not sure it will be any quicker
#EnviableOne
Please try this SQL statement.
SELECT
L.IP,
(
SELECT TOP 1
CONCAT(coco, ', ', region, ', ', city)
FROM
GT_Geo_IP
WHERE
IP_Start >= L.intIP ORDER BY IP_Start
) AS Location
FROM
Log_Table AS L;

SQL Query Notifications and GetDate()

I am currently working on a query that is registered for Query Notifications. In accordance w/ the rules of Notification Serivces, I can only use Deterministic functions in my queries set up for subscription. However, GetDate() (and almost any other means that I can think of) are non-deterministic. Whenever I pull my data, I would like to be able to limit the result set to only relevant records, which is determined by the current day.
Does anyone know of a work around that I could use that would allow me to use the current date to filter my results but not invalidate the query for query notifications?
Example Code:
SELECT fcDate as RecordDate, fcYear as FiscalYear, fcPeriod as FiscalPeriod, fcFiscalWeek as FiscalWeek, fcIsPeriodEndDate as IsPeriodEnd, fcPeriodWeek as WeekOfPeriod
FROM dbo.bFiscalCalendar
WHERE fcDate >= GetDate() -- This line invalidates the query for notification...
Other thoughts:
We have an application controls table in our database that we use to store application level settings. I had thought to write a small script that keeps a record up to date w/ teh current smalldatetime. However, my join to this table is failing for notificaiton as well and I am not sure why. I surmise that it has something to do w/ me specifitying a text type (the column name), which is frustrating.
Example Code 2:
SELECT fcDate as RecordDate, fcYear as FiscalYear, fcPeriod as FiscalPeriod, fcFiscalWeek as FiscalWeek, fcIsPeriodEndDate as IsPeriodEnd, fcPeriodWeek as WeekOfPeriod
FROM dbo.bFiscalCalendar
INNER JOIN dbo.xApplicationControls ON fcDate >= acValue AND acName = N'Cache_CurrentDate'
Does anyone have any suggestions?
EDIT: Here is a link on MSDN that gives the rules for Notification Services
As it turns out, I figured out the solution. Basically, I was invalidating my query attempts because I was casting a value as a DateTime which marks it as Non-Deterministic. Even though you don't specifically call out a cast but do something akin to:
RecordDate = 'date_string_value'
You still end up w/ a Date Cast. Hopefully this will help out someone else who hits this issue.
This link helped me quite a bit.
http://msdn.microsoft.com/en-us/library/ms178091.aspx
A good way to bypass this is simply to create a view that just says "SELECT GetDate() AS Now", then use the view in your query.
EDIT : I see nothing about not using user-defined functions (which is what I've used the 'view today' bit in). So can you use a UDF in the query that points at the view?

Autocomplete Dropdown - too much data, timing out

So, I have an autocomplete dropdown with a list of townships. Initially I just had the 20 or so that we had in the database... but recently, we have noticed that some of our data lies in other counties... even other states. So, the answer to that was buy one of those databases with all towns in the US (yes, I know, geocoding is the answer but due to time constraints we are doing this until we have time for that feature).
So, when we had 20-25 towns the autocomplete worked stellarly... now that there are 80,000 it's not as easy.
As I type I am thinking that the best way to do this is default to this state, then there will be much less. I will add a state selector to the page that defaults to NJ then you can pick another state if need be, this will narrow down the list to < 1000. Though, I may have the same issue? Does anyone know of a work around for an autocomplete with a lot of data?
should I post teh codez of my webservice?
Are you trying to autocomplete after only 1 character is typed? Maybe wait until 2 or more...?
Also, can you just return the top 10 rows, or something?
Sounds like your application is suffocating on the amount of data being returned, and then attempted to be rendered by the browser.
I assume that your database has the proper indexes, and you don't have a performance problem there.
I would limit the results of your service to no more than say 100 results. Users will not look at any more than that any how.
I would also only being retrieving the data from the service once 2 or 3 characters are entered which will further reduce the scope of the query.
Good Luck!
Stupid question maybe, but... have you checked to make sure you have an index on the town name column? I wouldn't think 80K names should be stressing your database...
I think you're on the right track. Use a series of cascading inputs, State -> County -> Township where each succeeding one grabs the potential population based on the value of the preceding one. Each input would validate against its potential population to avoid spurious inputs. I would suggest caching the intermediate results and querying against them for the autocomplete instead of going all the way back to the database each time.
If you have control of the underlying SQL, you may want to try several "UNION" queries instead of one query with several "OR like" lines in its where clause.
Check out this article on optimizing SQL.
I'd just limit the SQL query with a TOP clause. I also like using a "less than" instead of a like:
select top 10 name from cities where #partialname < name order by name;
that "Ce" will give you "Cedar Grove" and "Cedar Knolls" but also "Chatham" & "Cherry Hill" so you always get ten.
In LINQ:
var q = (from c in db.Cities
where partialname < c.Name
orderby c.Name
select c.Name).Take(10);

Resources