What does MS Sysinternals tool(Sysmon)'s guid meaning - uuid

I have a guid which Sysinternals tools named Sysmon left.
It looks like this.
3/18 C591B94E-4BDD-5AAE-0000-001073B13706
4/4 C591B94E-1BFA-5AC5-0000-0010E76F3903
4/29 C591B94E-A33F-5AE5-0000-001074CA4C26
5/2(different windows account) C591B94E-E23B-5AE9-0000-0010DD40EF32
5/2(on the virtual machine) A15730FB-E3DA-5AE9-0000-0010AB2C0800
It's generated when the process is created(Event id 1) in my computer on different days and different environment.
And I Found the uuid format (https://en.wikipedia.org/wiki/Universally_unique_identifier)
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx(M indicate the UUID version, and the one to three most significant bits of digit N indicate the UUID variant)
According to this, my 3/18 example is C591B94E-4BDD-5AAE-0000-001073B13706. It means M is 5, N is 0, In other words, UUID version is 5, variant is 0. It means It's SHA-1 Hash Value(Version 5) and Variant is 0.
I really wonder what the other number does mean. Because the sysmon's documents says that guid is helpful for correlation BUT they never explain what does this number mean.
I can guess the first group is related to PC information. because only when I chanaged the PC(5/2 on the virtual machine) the first group is changed(C591B94E -> A15730FB). So I thought It's related to Mac or IP address. But even if I changed the MAC and IP address, It stayed A15730FB or C591B94E.
I'm sure the second group is related to time.
But I can't figure out what does this exactly mean.

The GUID does not specifically mean anything in itself. Its purpose is to allow you to correlate and filter process events when Windows reuses process IDs (in this way you can think of it as a completely unique process ID).
From: https://learn.microsoft.com/en-us/sysinternals/downloads/sysmon
"Includes a process GUID in process create events to allow for correlation of events even when Windows reuses process IDs."

Related

How to Implement Patterns to Match Brute Force Login and Port Scanning Attacks using Flink CEP

I have a use case where a large no of logs will be consumed to the apache flink CEP. My use case is to find the brute force attack and port scanning attack. The challenge here is that while in ordinary CEP we compare the value against a constant like "event" = login. In this case the Criteria is different as in the case of brute force attack we have the criteria as follows.
username is constant and event="login failure" (Delimiter the event happens 5 times within 5 minutes).
It means the logs with the login failure event is received for the same username 5 times within 5 minutes
And for port Scanning we have the following criteira.
ip address is constant and dest port is variable (Delimiter is the event happens 10 times within 1 minute). It means the logs with constant ip address is received for the 10 different ports within 1 minute.
With Flink, when you want to process the events for something like one username or one ip address in isolation, the way to do this is to partition the stream by a key, using keyBy(). The training materials in the Flink docs have a section on Keyed Streams that explains this part of the DataStream API in more detail. keyBy() is the roughly same concept as a GROUP BY in SQL, if that helps.
With CEP, if you first key the stream, then the pattern will be matched separately for each distinct value of the key, which is what you want.
However, rather than CEP, I would instead recommend Flink SQL, perhaps in combination with MATCH_RECOGNIZE, for this use case. MATCH_RECOGNIZE is a higher-level API, built on top of CEP, and it's easier to work with. In combination with SQL, the result is quite powerful.
You'll find some Flink SQL training materials and examples (including examples that use MATCH_RECOGNIZE) in Ververica's github account.
Update
To be clear, I wouldn't use MATCH_RECOGNIZE for these specific rules; neither it nor CEP is needed for this use case. I mentioned it in case you have other rules where it would be helpful. (My reason for not recommending CEP in this case is that implementing the distinct constraint might be messy.)
For example, for the port scanning case you can do something like this:
SELECT e1.ip, COUNT(DISTINCT e2.port)
FROM events e1, events e2
WHERE e1.ip = e2.ip AND timestampDiff(MINUTE, e1.ts, e2.ts) < 1
GROUP BY e1.ip HAVING COUNT(DISTINCT e2.port) >= 10;
The login case is similar, but easier.
Note that when working with streaming SQL, you should give some thought to state retention.
Further update
This query is likely to return a given IP address many times, but it's not desirable to generate multiple alerts.
This could be handled by inserting matching IP addresses into an Alert table, and only generate alerts for IPs that aren't already there.
Or the output of the SQL query could be processed by a de-duplicator implemented using the DataStream API, similar to the example in the Flink docs. If you only want to suppress duplicate alerts for some period of time, use a KeyedProcessFunction instead of a RichFlatMapFunction, and use a Timer to clear the state when it's time to re-enable alerts for a given IP.
Yet another update (concerning CEP and distinctness)
Implementing this with CEP should be possible. You'll want to key the stream by the IP address, and have a pattern that has to match within one minute.
The pattern can be roughly like this:
Pattern<Event, ?> pattern = Pattern
.<Event>begin("distinctPorts")
.where(iterative condition 1)
.oneOrMore()
.followedBy("end")
.where(iterative condition 2)
.within(1 minute)
The first iterative condition returns true if the event being added to the pattern has a distinct port from all of the previously matching events. Somewhat similar to the example here, in the docs.
The second iterative condition returns true if size("distinctPorts") >= 9 and this event also has yet another distinct port.
See this Flink Forward talk (youtube video) for a somewhat similar example at the end of the talk.
If you try this and get stuck, please ask a new question, showing us what you've tried and where you're stuck.

Flink - Grouping query to external system per operator instance while enriching an event

I am currently writing a streaming application where:
as an input, I am receiving some alerts from a kafka topic (1 alert is linked to 1 resource, for example 1 alert will be linked to my-router-1 or to my-switch-1 or to my-VM-1 or my-VM-2 or ...)
I need then to do a query to an external system in order to enrich the alert with some additional information linked to the resource on which the alert is linked
When querying the external system:
I do not want to do 1 query per alert and not even 1 query per resource
I rather want to do group queries (1 query for several alerts linked to several resources)
My idea was to have something like n buffer (n being a small number representing the nb of queries that I will do in parallel), and then for a given time period (let's say 100ms), put all alerts within one of those buffer and at the end of those 100ms, do my n queries in parallel (1 query being responsible for enriching several alerts belonging to several resources).
In Spark, it is something that I would do through a mapPartitions (if I have n partition, then I will do only n queries in parallel to my external system and each query will be for all the alerts received during the micro-batch for one partition).
Now, I am currently looking at Flink and I haven't really found what is the best way of doing such kind of grouping when requesting an external system.
When looking at this kind of use case and especially at asyncio (https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/asyncio.html), it seems that it deals with 1 query per key.
For example, I can very easily:
define the resouce id as a key
define a processing time window of 100ms
and then do my query to the external system (synchronously or maybe better asynchrously through the asyncio feature)
But by doing so, I will do 1 query per resource (maybe for several alerts but linked to the same key, ie the same resource).
It is not what I want to do as it will lead to too much queries to the external system.
I've then explored the option of defining a kind of technical key for my requests (something like the hashCode of my resource id % nb of queries I want to perform).
So, if I want to do max 4 queries in parallel, then my key will be something like "resourceId.hashCode % 4".
I was thinking that it was ok, but when looking more deeply to some metrics when running my job, I found that that my queries were not well distributed to my 4 operator instances (only 2 of them were doing something).
It comes for the mechanism used to assign a key to a given operator instance:
public static int assignKeyToParallelOperator(Object key, int maxParallelism, int parallelism) {
return computeOperatorIndexForKeyGroup(maxParallelism, parallelism, assignToKeyGroup(key, maxParallelism));
}
(in my case, parallelism being 4, maxParallelism 128 and my key value in the range [0,4[ ) (in such a context, 2 of my keys goes to operator instance 3 and 2 to operator instance 4) (operator instance 1 and 2 will have nothing to do).
I was thinking that key=0 will go to operator 0, key 1 to operator 1, key 2 to operator 2 and key 3 to operator 3, but it is not the case.
So do you know what will be the best approach to do this kind of grouping while querying an external system ?
ie 1 query per operator instance for all the alerts "received" by this operator instance during the last 100ms.
You can put an aggregator function upstream of the async function, where that function (using a timed window) outputs a record with <resource id><list of alerts to query>. You'd key the stream by the <resource id> ahead of the aggregator, which should then get pipelined to the async function.

Movilizer - Masterdata pool id as integer across participant/devices

A masterdata descriptor like this one: $masterdata:"pool_name" is converted in the Movilizer client to an integer number like 113.
We are building a logic that sends back to the backend the poolid (113) and the key modified (key="key1") in a DataContainer.
The DataContainer key is formed like this: "poolid$$key", my question is:
Is the poolid integer number the same accross participant/devices?? (Always 113), or is it random depending on the client?
I need to know this to send some string with the poolname instead of the poolid for this to work.
If I have to guess I'd say the poolID mapping is the same over all participants/devices and depends on the order of pools as they are created in a systemID. Probably the first pool that is created in the sysID gets the mapping 0 or 1 ... and so on. But this is just a wild guess and I am not sure if your approach is advisable at all. In the end it could all rely on an unsorted data structure and might change the ordering of the id's in unexpected ways based on that.

What's the difference between "Exchange Legacy Distinguished Name" and "Active Directory Distingushed Name"?

I'm a little confused by these two terms: "Legacy Distinguished Name"(Legacy DN) and "Distingushed Name"(DN).
The first term Legacy DN seems only for Exchange, while the latter DN is only mentioned for Active Directory.
They are obviously not in same format:
DN is like: CN=Morgan Cheng, OU= SomeOrg, DC=SomeCom, DC=com
LegacyDN is like: /o=SomeDomain/ou=SomeGroup/cn=Recipients/cn=Morgan Cheng
I am still not clear what exactly the differce is. Are they two totally differnt stuff? or just same info represented in two different forms?
And, why is it called "Legacy"? If it is legacy, something must be new, right?
Hope some AD and Exchang experts can give me some inputs.
In Exchange 5.5, Exchange was assigning distinguished names to accounts and mailboxes (Obj-Dist-Name). When Active Directory came along, Exchange 2000 and later would use its distinguished names instead. In order to preserve backwards compatibility, migration from Exchange 5.5 to Exchange 2000 carried over the old DNs into the legacyExchangeDN attribute of ActiveDirectory.
Some applications continue to refer to Obj-Dist-Name. To preserve compatibility with these applications, later exchange versions synthesize a legacyExchangeDN value even for objects that have not been migrated from Exchange 5.5. The RUS automatically sets it to some value, apparently to the same value as the distinguishedName in your case.
The "new" way (since 2000) is to refer to objects by distinguished name, not Obj-Dist-Name.

Entity relationship

I have started develping database for machineries performance mgt system
Facts:
1.A machine(platNo,model,name) can work on several cane fields(fieldNo,fieldNo)
- machine vs field
2.Many machineries can work on a cane field
3.A machine can do tasks for many userDept(deptId,deptName)
4.A userDept demands several machines for its activity{A task can be done on several cane fields; plowing,land shaping,etc can be done on field 1, 2, 3...- task vs field,
Many tasks can be done on a field; on field 1 , plowing ,harrowing,... can be done
- task vs field?/?}
5.A machine can do for many userDept; lpcd(using its machine) can do the same type of work (e.g.: plowing) for plantation, rehabilitation and expansion projects.
- task vs userDept
6.Much type of tasks can be done for a userDept; plowing, harrowing,... can be done for plantation- task vs user
7.A machine works in three shifts(1 -to- 3)
Problem : please help me in designing the ER!!
Thanks,
Dejene
I'll assume platNo can be used as a unique identifier for a machine. There are quite a few possibilities depending on rules that you have left ambiguous - e.g. some of the following relations may not be required or may need to be modified:
MACHINE (platNo, model, name) - represents each machine
FIELD (fieldNo) - represents each cane field
TASK (taskId, taskName) - represents the various tasks (e.g. plowing, harrowing) that can be done by any machine
USERDEPT (deptId, deptName) - represents each department
PROJECT (projId, projName, deptId) - represents each project for each department (e.g. plantation, rehabilitation, expansion)
SHIFT (shiftNo) - represents the shifts that any machine might be assigned to
MACHINE_FIELD (platNo, fieldNo) - represents the fact that a particular machine can work on a particular cane field
MACHINE_TASK (platNo, taskId) - represents the fact that a particular machine can perform a particular task
PROJECT_REQUIREMENT (projId, taskId) - represents the fact that a particular project (for a particular department) requires a particular task
MACHINE_ASSIGNMENT (projId, taskId, shiftNo, platNo) - represents the fact that a particular machine has been assigned to perform a particular task on a given shift

Resources