Interceptor in Flume with syslog data - interceptor

Please find below a sample log message that I am receiving from syslog
<159>Apr 15 17:27:31 192.168.100.40 CEF:0|Websense|Security|7.8.1|68|Transaction permitted|1| act=permitted app=http dvc=192.168.100.40 dst=221.135.111.120 dhost=img-d01.moneycontrol.co.in dpt=80 src=172.16.237.89 spt=55016 suser=LDAP://172.17.251.11 OU\=Users,OU\=Migrated,DC\=abc,DC\=com/Sourabh Jain destinationTranslatedPort=38419 rt=1460721451000 in=496 out=6999 requestMethod=GET requestClientApplication=Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0 reason=- cs1Label=Policy cs1=role-8**Default cs2Label=DynCat cs2=0 cs3Label=ContentType cs3=image/jpeg cn1Label=DispositionCode cn1=1048 cn2Label=ScanDuration cn2=3 request=http://img-d01.moneycontrol.co.in/news_html_files/wealth-experts/abhim1132661059.jpg
If you observer , there are key values pairs in the data. Is there any way , I can extract values and store the data. I can't use space as seperater as some of the values in key pair contains space
e.g:
suser=LDAP://172.17.251.11 OU\=Users,OU\=Migrated,DC\=abc,DC\=com/Sourabh S Jain
There are spaces between "Sourabh S Jain"

Able to solve it using the OR operator .
(suser=-|suser=LDAP://.{1,150}/)

Related

Shibboleth variables not coming over with Coldfusion 2021 & IIS

I am trying to use Shibboleth 3 as the sp and azure AD as the ipd and I can see that I have successfully implemented based on the Shibboleth transaction log.
2022-12-16 12:35:54|Shibboleth-TRANSACTION.AuthnRequest|||https://sts.windows.net/c04845f0-4224-4637-aed2-9beea8319b5b/||||||urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect||||||
2022-12-16 12:35:55|Shibboleth-TRANSACTION.Login||_292e2cf9f81890bcdf7ffe1cd147c92f|https://sts.windows.net/c04845f0-4224-4637-aed2-9beea8319b5b/|_ff1422a3-4c91-4255-adec-fa6fd52d2600|urn:oasis:names:tc:SAML:2.0:ac:classes:Password|2022-12-16T07:00:19|authnmethodsreferences(2),displayname(1),emailaddress(1),givenname(1),groups(1),identityprovider(1),objectidentifier(1),surname(1),tenantid(1)|davisg1#XXXXX.com|urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST||urn:oasis:names:tc:SAML:2.0:status:Success|||Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.46|167.244.201.154
I changed the email in the text above to "davisg1#XXXXX.com" for obvious reasons.
However I can't seem to retrieve the variables on my Coldfusion page. I have googled endlessly and not found an answer.
I tried dumping cgi and getHTTPRequestData() and i also tried hardcoding like http_givenName #cgi['http_givenName']# and HTTP_REMOTE_USER #cgi['HTTP_REMOTE_USER']# but nothing useful appears
I have updated by attributes-map.xml to use the "name" field returned by azure AD and made sure that in shibboleth.xml that ApplicationDefaults REMOTE_USER uses persistentID
<Attribute name="http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name" id="persistent-id"> <AttributeDecoder xsi:type="NameIDAttributeDecoder" formatter="$NameQualifier!$SPNameQualifier!$Name" defaultQualifiers="true"/>
</Attribute>
<ApplicationDefaults entityID="https://intranettest.amc.edu/shibboleth-sp"
REMOTE_USER="eppn subject-id pairwise-id persistent-id"
cipherSuites="DEFAULT:!EXP:!LOW:!aNULL:!eNULL:!DES:!IDEA:!SEED:!RC4:!3DES:!kRSA:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1">
The answer was to add useHeaders="true" to the ISAPI tag in shibboleth2.xml
<ISAPI normalizeRequest="true" safeHeaderNames="true" useHeaders="true">

SOLR how to limit the search content in solr query

i want to search the words upto particular line and not beyond that using solr query. i have tried proximity match but it didnt worked. my data is like
Blockquote"Date: Thu, 24 Jul 2014 09:36:44 GMT\nCache-Control: private\nContent-Type: application/json; charset=utf-8\nContent-Encoding: gzip\nVary: Accept-Encoding\nP3P: CP=%20CURo TAIo IVAo IVDo ONL UNI COM NAV INT DEM STA OUR%20\nX-Powered-By: ASP.NET\nContent-Length: 570 \nKeep-Alive: timeout=120\nConnection: Keep-Alive\n\n[{%20rows%20:[],%20index%20:[],%20folders%20:[[%20Inbox%20,%20Inbox%20,%20%20,1,1,0,0,0,%20Inbox%20,0,0,%20none%20,0],[%20Drafts%20,%20Drafts%20,%20%20,1,1,0,0,0,%20Drafts%20,0,0,%20none%20,0],[%20Sent%20,%20Sent%20,%20%20,1,1,0,0,11,%20Sent%20,1,0,%20none%20,0],[%20Spam%20,%20Spam%20,%20%20,1,1,0,0,0,%20Spam%20,1,0,%20none%20,0],[%20Deleted%20,%20Trash%20,%20%20,1,1,0,7,9,%20Deleted%20,1,0,%20none%20,0],[%20Saved%20,%20Saved Mail%20,%20%20,1,1,0,0,0,%20Saved%20,1,0,%20none%20,0],[%20SavedIMs%20,%20Saved Chats%20,%20Saved%20,2,1,0,0,0,%20SavedIMs%20,1,0,%20none%20,0]],%20fcsupport%20:true,%20hasNewMsg%20:false,%20totalItems%20:0,%20isSuccess%20:true,%20foldersCanMoveTo%20:[%20Sent%20,%20Spam%20,%20Deleted%20,%20Saved%20,%20SavedIMs%20],%20indexStart%20:0}]POST /38664-816/aol-6/en-us/common/rpc/RPC.aspx?user=hl1lkgReIh&transport=xmlhttp&r=0.019667088333411797&a=GetMessageList&l=31211 HTTP/1.1\nHost: mail.aol.com\nUser-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8\nAccept-Language: en-US,en;q=0.5\nAccept-Encoding: gzip, deflate\nContent-Type: application/x-www-form-urlencoded; charset=UTF-8\nX-Requested-With: XMLHttpRequest\nReferer: http://mail.aol.com/38664-816/aol-6/en-us/Suite.aspx\nContent-Length: 452\nCookie: mbox=PC#1405514778803-136292.22_06#1407395182|session#1406185366924-436868#1406187442|check#true#1406185642; s_pers=%20s_fid%3D55C638B5F089E6FB-19ACDEED1644FD86%7C1469344726539%3B%20s_getnr%3D1406186326569-Repeat%7C1469258326569%3B%20s_nrgvo%3DRepeat%7C1469258326571%3B; s_vi=[CS]v1|29E33A0D051D366F-60000105200097FF[CE]; UNAUTHID=1.5efb4a11934a40b8b5272557263dadfe.88c5; RSP_COOKIE=type=30&name=YWxzaGFraWIyMDE0&sn=MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D&stype=0&agrp=M; LTState=ver:5&lav:22&un:*UQo5AwAnAytffwJSYg%3d%3d&sn:*UQo5AwAnAytffwJSYg%3d%3d&uv:AOL&lc:en-us&ud:aol.com&ea:*UQo5AwAnAytffwJSCAsnWWoJASZL&prmc:825345&mt:6&ams:1&cmai:365&snt:0&vnop:False&mh:core-mia002b.r1000.mail.aol.com&br:100&wm:mail.aol.com&ckd:.mail.aol.com&ckp:%2f&ha:1NGRuUTRRxGFF2s5A4JwkuCT43Q%3d&; aolweatherlocation=10003; DataLayer=cons%3D6.107%26coms%3D629; grvinsights=69f3a2bb86ed3cd31aa1d14a1ce9e845; CUNAUTHID=1.5efb4a11934a40b8b5272557263dadfe.88c5; s_sess=%20s_cc%3Dtrue%3B%20s_sq%3Daolcmp%253D%252526pid%25253Dcmp%2525253A%25252520Help%25252520%2525257C%25252520View%25252520Article%2525253A%25252520Clear%25252520cookies%2525252C%25252520cache%2525252C%25252520history%25252520and%25252520footprints%252526pidt%25253D1%252526oid%25253Dhttp%2525253A%2525252F%2525252Fwebmail.aol.com%2525252F%2525253F_AOLLOCAL%2525253Dmail%252526ot%25253DA%2526aolsnssignin%253D%252526pid%25253Dsso%25252520%2525253A%25252520login%252526pidt%25253D1%252526oid%25253DSign%25252520In%252526oidt%25253D3%252526ot%25253DSUBMIT%3B; L7Id=31211; Context=ver:3&sid:923f783b-bc6e-4edf-87c9-e52f19b3ce67&rt:STANDARD&i:f&ckd:.mail.aol.com&ckp:%2f&ha:X80Ku4ffRKsOVSwgmEVPCfpfxeU%3d&; IDP_A=s-1-V0c3QiuO6BzQ5S6_u3s0brfUqMCktezAz7sWlVfHD90omIijDXRrMJkSM-9-xcnUcSTnXbcZ1aUCgvfuToVeJihcftKY5KtsC_nB7Y9qf6P0xUnNfCIAmWVtRf4ctSQ9JwRIzHa40dhFuULwYLu3NUPTxckeFUFAzcSS4hrmb4grhEtyOGp0qV5rIKtjs4u8; MC_CMP_ESK=NonSense; SNS_AA=asrc=2&sst=1406185424&type=0; _utd=gd#MzRb%2FjjHIe8odpr%2FfxZR2g%3D%3D|pr#a|st#sns.webmail.aol.com|uid#; Auth=ver:22&uas:*UQo5AwAnAytffwJSZAskRiwLBSIDWVpVXxVTVwJCLFxdSnpHUWBbeV1jcikERgl6CEYLJUweGUhdFQQLW1h%2bBAZRcllWfVl8VH4DUmRaZARoPhw%2bBFBA&idl:0&un:*UQo5AwAnAytffwJSYg%3d%3d&at:SNS&sn:*UQo5AwAnAytffwJSYg%3d%3d&wim:%252FwQCAAAAAAAEk2ihy%252BE4MMebm4R1jvxY07zNZhFOHSz2EFBnsNdOAUsl8QyZceo54kWYZ4vwVayLFF7w&sty:0&ud:aol.com&uid:hl1lkgReIh&ss:635417678271359104&svs:SNS_AA%7c1406185424&la:635417687268954835&aat:A&act:M&br:100&cbr:AOL&mt:&pay:0&mbt:G&uv:AOL&lc:en-us&bid:1&acd:1403348988&pix:3829&prmc:825345&relm:aol&mah:%2\nConnection: keep-alive\n"
and want to search Content-Type: application/json from the data and not beyond this line. I have tried
http://192.168.0.164:8983/solr/collection_with_all_details/select?q=Content%3AContent-Typejson*&wt=json&indent=true
but it searches in entire content. i need to limit the search content
I don't think it is possible in this case. You can check highlighter to return that first 200 characters in highlighting response.
May be you need think of writting a custom response writer which can help on this.
One more option cab be creating additional field with indexed="false" stored="true" will be more efficient.
Create your original field indexed="true" stored="false", your index size will be diminished. New copy field will be indexed="false" stored="true".
<copyField source="text" dest="textShort" maxChars="200"/>
Check if this works out for you.
You should really, really pre-process your data to just index the part that you're going to use. Doing it after the fact will not be a good solution, as you'll have most of the content in the index already, and you're looking for a separator that's not positioned in one specific byte location (which is what maxChars would be able to do).
Depending on how you're indexing, you can either do it in the indexing step (regextransformer, in your own code using SolrJ, etc), or do it in the analysis step of the code, by using something like a patternreplacefilter. That would allow you to remove anything after the header you're looking for.
That way you should be able to index the content into one header field and one body field for example, depending on your need.

camel netty4http and camel rest dsl: Get remote address

I'm looking for a way to get the ip address with camel rest dsl and the Netty4 Http component.
I checked on the documentation, I've put a breakpoint on my rest and checked on the headers, the properties,...everywhere, and couldn't find a proper way get this information.
Headers log:
GET: http://localhost:8080/category,
{Accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8, Accept-Encoding=gzip, deflate, sdch, Accept-Language=fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4, breadcrumbId=ID-nateriver-54582-1445489005229-0-1, CamelCATEGORY_ACTION=listAction, CamelHttpMethod=GET, CamelHttpPath=, CamelHttpUri=/category, CamelHttpUrl=http://localhost:8080/category, CamelJmsDeliveryMode=2, Connection=keep-alive, Content-Length=0, Cookie=JSESSIONID=fowfzar8n09e16ej9jui6nmsv, Host=localhost:8080, JMSCorrelationID=null, JMSDeliveryMode=2, JMSDestination=topic://Statistics, JMSExpiration=0, JMSMessageID=ID:nateriver-54592-1445489009836-3:1:7:1:1, JMSPriority=4, JMSRedelivered=false, JMSReplyTo=null, JMSTimestamp=1445489017233, JMSType=null, JMSXGroupID=null, JMSXUserID=null, Upgrade-Insecure-Requests=1, User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36}
You should get two headers populated:
CamelNettyLocalAddress and CamelNettyRemoteAddress.
See here where the debug log of netty-http shows this clearly.
http://camel.465427.n5.nabble.com/How-to-create-case-insensitive-URI-route-with-netty4-http-td5766517.html#a5766558

Prevent connection close with JSoup

I am not much knowledgeable with it comes to networking (i.e. http) or JSoup. I am using JSoup to get meta tag contents from a url. I am getting the error
Connection closed unexpectedly by server at URL: http://blahblah
Here is my code
Document doc = Jsoup.connect(url).get();
Elements metas = doc.getElementsByTag("meta");
...
How do I "configure" JSoup to just grab the content of the webpage, close the connection, and then proceed to parse the content obtained? I am asking the question like this because I imagine the closing of connection is due to it taking too long. Or is it something else? Like the server knows it's not a human caller or such? Say the site is cnn or whatever and I am trying to parse a news article for meta-tag contents. And no I am not crawling: I am given a url and I am sifting through that one page.
May be You have to send some header data as below.
Please try it.
Document doc = Jsoup
.connect(url.trim())
.timeout(3000)
.header("Host", "someip")
.header("Connection", "keep-alive")
.header("Content-Length", "111")
.header("Cache-Control", "max-age=0")
.header("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8")
.header("User-Agent",
"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36")
.header("Content-Type", "application/x-www-form-urlencoded")
.header("Referer", url.trim())
.header("Accept-Encoding", "gzip,deflate,sdch")
.header("Accept-Language", "en-US,en;q=0.8,ru;q=0.6")
.userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36")
.get();
I have absolutely no idea why, but the problem stops when I do
Connection connection = Jsoup.connect(url);
Document doc = connection.get();
Elements metas = doc.getElementsByTag("meta");
...
Instead of
Document doc = Jsoup.connect(url).get();
Elements metas = doc.getElementsByTag("meta");
...
It makes completely no sense to me. But it is what it is. I have heard of "constructors escaping", which is what lead me to do the separation. And while this is probably not the same thing, but some similar type of voodoo may be happening under the hood that I just don't understand.

Parsing HTTP Headers

I've had a new found interest in building a small, efficient web server in C and have had some trouble parsing POST methods from the HTTP Header. Would anyone have any advice as to how to handle retrieving the name/value pairs from the "posted" data?
POST /test HTTP/1.1
Host: test-domain.com:7017
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://test-domain.com:7017/index.html
Cookie: __utma=43166241.217413299.1220726314.1221171690.1221200181.16; __utmz=43166241.1220726314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
Cache-Control: max-age=0
Content-Type: application/x-www-form-urlencoded
Content-Length: 25
field1=asfd&field2=a3f3f3
// ^-this
I see no tangible way to retrieve the bottom line as a whole and ensure that it works every time. I'm not a fan of hard-coding in anything.
You can retrieve the name/value pairs by searching for newline newline or more specifically \r\n\r\n (after this, the body of the message will start).
Then you can simply split the list by the &, and then split each of those returned strings between the = for name/value pairs.
See the HTTP 1.1 RFC.
Once you have Content-Length in the header, you know the amount of bytes to be read right after the blank line. If, for any reason (GET or POST) Content-Length is not in the header, it means there's nothing to read after the blank line (crlf).
You need to keep parsing the stream as headers until you see the blank line. The rest is the POST data.
You need to write a little parser for the post data. You can use C library routines to do something quick and dirty, like index, strtok, and sscanf. If you have room for it in your definition of "small", you could do something more elaborate with a regular expression library, or even with flex and bison.
At least, I think this kind of answers your question.
IETF RFC notwithstanding, here is a more to the point answer. Assuming that you realize that there is always an extra /r/n after the Content-Length line in the header, you should be able to do the work to isolate it into a char* variable named data. This is where we start.
char *data = "f1=asfd&f2=a3f3f3";
char f1[100],
char f2[100];
sscanf(data, "%s&%s", &f1, &f2); // get the field tuples
char f1_name[50];
char f1_data[50];
sscanf(f1, "%s=%s", f1_name, f1_data);
char f2_name[50];
char f2_data[50];
sscanf(f2, "%s=%s", f2_name, f2_data);

Resources