Specify alternative line_end character for Elixir File.stream! function

Specify alternative line_end character for Elixir File.stream! function - file

Elixir's File.stream! splits on and assumed \r character.
Is it possible to specify for example, \r\n or any other pattern?
Such a convenience would make File.stream! a handy file parser.
Edit: Added source file content:
iex(1)> File.read! "D:\\Projects\\Telegram\\PQ.txt"
"1039027537039357001\r\n1124138842463513719\r\n1137145765766942221\r\n1159807134726147157\r\n1162386423249503807\r\n1166092057686212149\r\n1192934946182607263\r\n1239437837009623463\r\n1242249431735251217\r\n1286092661601003031\r\n1300223652350017207\r\n1320700236992142661\r\n1322986082402655259\r\n1342729635050601557\r\n1342815051384338027\r\n1361578683715077199\r\n1381265403472415423\r\n1387654405700676857\r\n1414719090657425471\r\n1438176310698548801\r\n1440426998028857687\r\n1444777794598883737\r\n1448786004429696643\r\n1449069084476072141\r\n1449922801627060913\r\n1459186197300152561\r\n1470497644058466497\r\n1497532721434112879\r\n1514370843858307907\r\n1528087672407582373\r\n1530255914631110911\r\n1537681216742780453\r\n1547498566041252091\r\n1563354550428106363\r\n1570520040759209689\r\n1570650619548126013\r\n1572342415580617699\r\n1595238677050713949\r\n1602246062455069687\r\n1603930707387709439\r\n1620038771342153713\r\n1626781435762382063\r\n1628817368590631491\r\n1646011824126204499\r\n1654346190847567153\r\n1671293643237388043\r\n1674249379765115707\r\n1683876665120978837\r\n1700490364729897369\r\n1724114033281923457\r\n1729626235343064671\r\n1736390408379387421\r\n1742094280210984849\r\n1750652888783086363\r\n1756848379834132853\r\n1769689620230136307\r\n1791811376213642701\r\n1802412521744570741\r\n1816018323888992941\r\n1816202297040826291\r\n1833488086890603497\r\n1834281595607491843\r\n1840295490995033057\r\n1843931859412695937\r\n1845134226412607369\r\n1847514467055999659\r\n1868936961235125427\r\n18733753
Example:
iex(134)> s|> Enum.to_list
["1039027537039357001\n", "1124138842463513719\n", "1137145765766942221\n",
"1159807134726147157\n", "1162386423249503807\n", "1166092057686212149\n",
"1192934946182607263\n", "1239437837009623463\n", "1242249431735251217\n",
"1286092661601003031\n", "1300223652350017207\n", "1320700236992142661\n",
"1322986082402655259\n", "1342729635050601557\n", "1342815051384338027\n",
"1361578683715077199\n", "1381265403472415423\n", "1387654405700676857\n",
"1414719090657425471\n", "1438176310698548801\n", "1440426998028857687\n",
"1444777794598883737\n", "1448786004429696643\n", "1449069084476072141\n",
"1449922801627060913\n", "1459186197300152561\n", "1470497644058466497\n",
"1497532721434112879\n", "1514370843858307907\n", "1528087672407582373\n",
"1530255914631110911\n", "1537681216742780453\n", "1547498566041252091\n",
"1563354550428106363\n", "1570520040759209689\n", "1570650619548126013\n",
"1572342415580617699\n", "1595238677050713949\n", "1602246062455069687\n",
"1603930707387709439\n", "1620038771342153713\n", "1626781435762382063\n",
"1628817368590631491\n", "1646011824126204499\n", "1654346190847567153\n",
"1671293643237388043\n", "1674249379765115707\n", "1683876665120978837\n",
"1700490364729897369\n", "1724114033281923457\n", ...]
iex(135)> s|> String.to_integer|> Primes.factorize|> Enum.to_list

Elixir handles the differences between Windows and Unix just fine by always normalizing "\r\n" into "\n", so developers don't need to worry about both formats. That's what is happening in the example above and that's what you should expect from the operations in both IO and File module.

You could open the file in raw mode (see here) and check the characters yourself.

Related

Trying to open a .ks-ipc file but the file type doesn't seem to exist and Word opens a string of random characters. Does anyone know what this is?

I'm home sick and trying to view a worksheet my teacher posted to Canvas. The file is listed on Canvas as a .ks-ipc file, here's a link to it in Google Drive: https://drive.google.com/drive/folders/110hWYFenrT5Ymz5twMsEroS3zVCLi7jN?usp=sharing
The file seemed to contain a bunch of random characters, which I think may or may not be raw bytes according to my limited internet research? Here is what it said:
æ#z gÊ/ Èc| ZI> ♥ ☻ ýµ
☺▼Ïxœí]ol∟G§Ÿ=Û‰c'qâ$ý♥$]%¡
R←Çi„H♦¨ÆiÁRm§±)♣ ¢õÝÆY¸?fw/‰)↕◄¨B¨jÂ↨ª"!ñ…OHù‚„„ø H >ô‼â_)T¨)►J¨PZ’▬³swïnï2gßì¼Ý ›'|Þ }3û{¿7³óöÍ®A◘1→ÛÖÆßW☼‼2¼P]ò↔¿hï˜ ZüÄä±#‼ôïý“Ç>♦%†▬éá ³–ï:↨Íù§Ûµ|§Röàø–9«d?f-ÙÅAú«¹û¤å7vÓ_°{ô”í:•BíÀ¦úo846U(8TµU¬↔®µñ/Á ÍOØ®↨ì‡ö_§zfÊùbµ×ªìÜY«►vnkì¬WÖT°³±»UióÐöÆ¡Fµµý# º¯Ñj>n—♥eùÇ∟ÏÿÂ^º·~lm-h+™ž_¢•†‚ 2NfOÛVa¾\\ dù§w Q|sã/½ÄaÏ·\®Z2B¶↓)8® ¯ ýð‚SZ):gW☼›æ§\Ç·ÍýÕrÁ>ë”íÂ~ólÅ5í‹+®íÑöz¦ÎòMËµÍf™ÃPÑŽÒ™òÂŠ•·↨]Ë):ååË# ÷¬œñ|wºêù•ÒÉfÝÍFÞµrfé“ž}ÇñP“G♂¶—w §zà¾♠e aÓ*↨LJ7×ª ◘pnýbÕöè♫ ÛU“°ñÌÕðÜJf▼¹¸òxã„Á ÐØX½ð–£ÿO}lúä#&Àe♠È\èÄë„iÖ ▬”Î◄òÜÜ☼J/ÏÞØ•Ðƒ♀…\f0d«]ùs§'oÏxs•²=vñœíÙkk´u„¼▲∟ß]´<¶RpÎ:v RqÑ)Õ
’{¬b±raúNMF¨Ðx©¦%_s¸♣ßò«^¸1F
’Q2← }n~éó =(·ŠÁv4Øèo;Ø~§lÔþW‚í3
Ÿ♫♫æ6(7▲•Õ’
ÎÙ▲l♂ÁÁ4ÎƒKþNpùðÿ¡”t¼‼œ/ªã]†♫#´ÕÝÄuÊþ”7U].Ùeß.Ôý#è°àþ_L—É«c◘Ád„¡ƒ♥î°<↨ríúéC—zñ–ñF³DÍ²† é ☻¤↑ž$è♣††<iÈATð☻Œ♫îôÇ‚,y " ƒh/h]»¨Ž[Ú♂2♠9ˆö☻<“è± k ƒh/¨K.}“ô›↨H 9HØ♂♠Hwi§←¸♦Ñ7ZuôèÛ3‼;~²úÊ•×Œ6í™ˆ¾õÖQD · Ey↓±‰Ä‼*ãU↕—y;&•Œà<T°†‘¨pŒ¨¬~1Æî;Ã—ª’ÕY#™¢V↨¼T•¬Î→%¶µºà¥ªdõ„oÐ3|©*Y 0t(juÁK♣é´z71Ú~ÑmS¨¢ÑSne©h—▬WWl(C“$¶Î”V*nPñ£n¥´{¦∟Ì‹è∟é”kçb¾Z¬6skvÍ»Î²S¶Š'ùÒ*è¹ÛàÜéFäí ŸÖÊß™95ÝLI¡ç¼▬œs/œ³û¾S^ö▲¸Sùªã¯▲ì¸ân¨ ♫TÛ↑h›¯5ÜûšÑ*C'l{Éº →h¶Õy2Œ<™]$Œe(O¦7‼ \Z↨]8¨ º•▬☻§²\s¶Zô ñõé¼yÁñÏ™5àó¶ÇDøÕ}]►^/‼‰☻↨=▬rß ¿üÓ¥7VŸ☻rƒd ▬↕)‼i”ô▬→aõô´,ÏhÑ↓w€(—,³Ï”îo¨ÝãD…¶#Ûˆ—♫,↔[↑:b¼Oês¸#Òf▼♠¶↑&aåGHÈ>Eà☻I›}↑PH g–ú> à☻Q }↑↔B▼±O☻¸#ÒfŸ,CIFîû¶ ♂„?ÆÖJ9↑$"Óì▼▲▼Üóé•Þ↑¤½æ♀L³{sP‘yuŽaÝ$æÕ↑Ì$œmÏHN#TH1àÀÐÁZC!aÿª◘\ i³☼#☼ ♥NÁ\¦¤Ø§◘\ i³ •#Ÿ♠œÃ♀↔↕²O◄¸#4ûê’‘‘W◄¸#ÒfŸ,CIFØ§◘\ ü©üPAü¹+½>Œ ?åN¶Ú±å¯|Ýh•◄Í_ ƒýœaÑC♫ËzfÚ ‡♣¢9 ¼§‘!ÔŽ0‼Ó·▼ì‚ézY+¶ªèá´Ñ+OþáÖôæIp↔ ♀„ÓbÍZ GÆh↓è×1Æ♣^↔C◘:0²V♠►Ú ±ò‡¥ƒc¬íÕì°E1;F#¶c•"♠ý V¬SÁ¸¹▬ñÆ·◄ ý{LOaxuì#Ðñ2Âµü6¹► 6½R¦•I»WÅxuü>¦¼¶^↔»◄®♣cŒ>Äh‡„½‚6½t¦•I»Wø3☻¶²Pã♠‚Ž♥♀³Ú+hÓKgz ´{♣Y♠Œ1„vPC‚☺£ßî§´é¥ë§L♦“Pãß’ÜFbL.3Ò+hÓKgz►‘÷¢ÑÇ©ÑŸª|õéñ↔7þµû↨F›öL<UÁMRîŒ§ƒ↓Òˆ…Ë²üˆŠ♀í►LYHâé↓Fv•♠Tq=µ’å±R„Î•
Æm↨l$%8è÷”dˆéofèˆévGCŽ☻9HÚ^€‘€ù^♦81tdÄ♂4äÒy☺F§‚☺ç#ú&IÊ♂4äÒyÁ←’ÜdJ’¤¼#C. ↨♀Ï↑Qv L’¥;"E • {AŽt↨£í↨Ýâ_ËÑ=)=¹Õ← ûàêY«7V V↓ÑÕ←►„í¯µ♣∟ë5X†Ù½♠ìT Ó»ïøX¶yhªXü ‼Õ ”º º©ËŠ ¯ŒÔ◄Œþlá»•o>ÿ·Ñg¯æ:¬” g♂±®Øˆò©▬Z&êˆ÷Ž$1aŒG#↑í ü“xe‡ÈkuÖ8áÁ¢X♫☺RŒ←"‚ ♥ãQ♫¯↔zÕ D±Î‡ Ð;§áIHë‘©%×D…ãg◘V◄AÇ▼Å˜Ã3‼Óp5%möaäæÀI÷e€}ŠÀ♣’6ûdqæŒ°O◄¸#4ûêò€↑œýÆ>A¸#Òf▼Æƒ§Œ°4Ë$↕²O◄¸#ø£çõä]→£+5¢•Ø&_↓¼ýf~mŸÑ¦=‼☺¶øß0Ì;7íœ‼G
…¼È☻#b↨×û\"x◘•´Ÿ´b¬↑¸Ž▬Œv∟►3♂O?¯!G ∟D♣/ EÇ½b&É’↨È¢C►r►§¼#–Ž© ¼#◄ÈAÒö‚}Ú$I{ †:/À€S◄“$å♣→rÔ↑↕ ▼E !½ôío\ûç³·ÞÍµiW(†Ôk¶•H¬‰7Ö↕Ž5↓ r$B½♥œõÆ•ªÂJ™áð‚¸ÌÂ{iXfÁ€¶#♦HÄÌ↕–´?6± q)¼:r◘:î↕ƒ”gˆÕp5%möa¬▼$◘:î◄ƒ3)ö)☻↨HÚìÛ) œï◄ƒ3)ö)☻↨HÚìÃ↑J0àÌHß§◘\ *°o↑AÇëÉ¥Öh¸š↕õ½x4,BçýÑÃ"/þæûŸýåóç›♠☺Q&,"2¯æ} #”♣o"^€1ÿæM▼êu♫Ïá♣Y‰ ↓◘:X&‹ÐaPI{ÈÂ˜ý¤H²^R á ¥◘\ i³☼Ã¡%€3)ö)☻↨HÚìÃHËfÝ;òêØ#♠gRìS♦.►§Ø' 3g‰}↕À♣¢ÙW↨Ö·x4ûâ‚♂D$TAçâÑC§sÏ¼ÿ[c3 ï2Ú´ëPEMèß¨) ÐN↕¡^Y>«‹‘F Ù{ØEÌ‚Ñé#Šñ WÁ¯↔óôó→r¶ÈA´↨à™D0Æª½ qÈA´↨Ô♣ãy›àK´úÍ♂$€∟D♣/ÀXj. I²ä♣Š#♫¢½ .↕♀Ïýæ♣↕#♫Òé♣Ý¤u¬§C¢ Q£Ç f♠☼~ôò“W,ƒ´×Üw1¤pü•6↕ ◄↑/‘Åˆÿ↑◘íéàw↕p‼„ËÄ0Y\ï♫ŽÐ™PI{qí›◘♀¾▼☺RÖw2xu°↕Âb→V5ä( ƒ¤í♣ÿ‘Ä$↑ SF¼#C. ↨ÜD0ÉÁ˜àÄ0«„^ !—Î♂0²]÷Ç♦'¯Žÿ&↨hÐ £#♫¢‚↨ü)¦ç¹Ú♂T‡∟„?Ü♠§ÄÿÙÃõ¾í▬ÿ‡☼ Žë
ÿ♠m ▼>|Êh•‼ýða¿~¤¯‡O▼®gš
}¸M↔T▼ì‚ªyaÂœ*.ÛKA♂òfPÞö˜XÏNtÁ:·Î•◄)„Ñ£ìwø{c§þ¹↨è¾pä2♥QöX?ˆÈêãy‼(Ã‘dƒ´rp1¢¯q·}£ ;Xªl„¶F↑ð¨¤ %?#ÄïCX)Û"6↕Å—ç▲Nc'çê
ŒÙÙ8‚Ž[↓ä¥ŠØ ¤ÍK♀Ÿß‚ £_ûKÙ°♥I›—↑÷8↑Ø†mœ§^ªˆ↔HÚ¼$◘¸Ä¥CB☻↨ˆÈ{´i³¢Ïíwñ›K?zßÓ☼↓↔5g`nŸì·Ø içÃ↔M¯3{‘„SƒðÑt„´, (BŽ♣Z↕ Ç¸m
gær7¾W
U♣Çigž‡?BÀí→"V¾Í¸)í¹b–Â◄Ö•♀ô¨PäJÆX§‹\ Ó&½^ Kaø+◄ ´ÙB◄&óÜ&h–↕¢Yš♠KAÒîk…nCDlS³8û,Nª¯Õ,Õ,Õ}¾cèg▬+ß×j–ÀR►Ý×F©Xm▬_gA#%‹u_«Y?KAÒîk…È$b{!÷éyº§YLåZHa&ûZÍRÍRÜ¾6Gº‹Ñö‹nñ/ á_☺ Ì²™ðÖ¹ µlæ²Ñ*'ºlF/ðˆ°˜f=ƒ½Ú o¡š÷ç¬’½.K·7
a»¶ÉÁ£‡?x¶Ž
/Xçk↓∟; LN∟9>qô!óÈñ‼G œ86Ù´âBÃŠa'↔>e-Û♂Î—ì\hçh#¡Àck°#‹ÿJ←►¸G làÈnÀ0ïáàœ«Á±_•œy)×:}lªì]◘ÚxÚ>o[E»►®o$¨ºR¬–Ê▲ñÿ>_Ä|
I first tried opening the .ks-ipc file, and my computer asked what program to try opening it in, so I selected Word. It gave me all the random characters splayed out across a few pages, which was unhelpful. I then looked up what a .ks-ipc file was to see if there were any programs that could open it, but I found absolutely nothing about the existence of that file type. As far as Google is concerned, it doesn't exist. So I thought, okay, maybe it's just a weird ipc file, whatever an ipc file is, because Google does seem to know what an ipc file is. I try opening it in an ipc file converter that allows you to open and view them online. It tells me there's no readable text because it's a binary file and spits out the same random characters Word did. Did some googling and came to the conclusion that the random characters might be raw bytes, so I tried putting them into a raw bytes to string (text?) converter, but I got a few errors and it wouldn't work. The first error was there's an uneven amount of hex characters, the second was that there was an invalid UTF-8, whatever that means. I have no idea what any of this means, and I'm hoping somebody here can help me figure out what's going on. Is there any way to figure out what this says, or did my instructor just screw up?

Loading pre-trained CBOW/skip-gram embeddings from a file that has unknown encoding?

I'm trying to load pre-trained word embeddings for the Arabic language (Mazajak embeddings: http://mazajak.inf.ed.ac.uk:8000/). The embeddings file does not have a particular extension and I'm struggling to get it to load. What's the usual process to load these embeddings?
I've tried doing with open("get_sg250", encoding = encoding) as file: file.readlines() for different encodings but it seems like none of them are the answer (utf-8 does not work at all), if I try windows-1256 I get gibberish:
e.g.
['8917028 300\n',
'</s> Hل®:0\x16ء:؟X§؛R8ڈ؛\xa0سî9K\u200fƒ::m¤9¼»“8¤p\u200c؛tعA:UU¾؛“_ع9‚Nƒ¹®G§¹قفگ؛ww$؛\u200eba:\x14.„:R¸پ:0–\x0b:–ü\x06:×#¦؛Yٍ²؛m ظ:{\x14¦:µ\x01‡:ه\x17S¹Yr¯:j\x03-¹ff€9×£P¸\n',
'W‚؛UUه9¼»é¹""§؛\u200c¶د:UU؟:\u200eb؟¹{\x14\u200d¸,ù19ïî\u200d؛ئ\x12¯؛\x00\x00ا:\u200c6°7A§a؛ذé„؛ذi†؛®G\x14:حجŒ8\x03\u200cè9ه\x17¸؛ق]¦؛ڈآ5¸قفا9حج^:\x00€ٹ؛q=²:\x00\x00¢9\x14®أ9×£T¹لz‚:\x1bèG؛®G7؛ڑ™<:m\xa0ƒ¹""´9\x14®\x1d:"¢²؛®G-؛ڑ™~:±ن¸:\x18ث«:¸\x1e…؛`,8؛Hل\u200d¹±ن.:\x1f…¥؛لْ‚:ڑ™s:R¸\x0b؛ئ’\x07؛0–C؛ڈآ¸:ذéھ:ة/خ¹A\'¸:ڑ™ز:m\xa0\x1e:è´ظ::ي‡؛\n',
'×\x05؛Œ%8؛ش\x06~؛أُu:\x00\x00\n',
":‰ˆ\x149\x14®?؛ِ(\x05:«ھ…:)\\‡833G:Haط؛\x1f…¼:¼»'9\x00\x00 ؛=\n",
'6؛R¸‚¹¼;€؛\x1bè¾؛\x1bèw؛قف؛:A§\x1a؛""j؛K~J:Hل\x14؛ىرد:\u200c6\x0c؛–|ب؛‚Nm:cةد·:mک؛‰ˆھ9\x00\x00ü9DD(¹ذi\x1f:ذé¬؛,ù™9¼»\x1e:wwƒ؛\x03\u200cF87ذ©·×£Q؛\x1f…w؛ئ\x12ح؛\x00\x00\x007ٍ‹U8\x0etZ6“ك«؛cةط؛Haد؛–ü¼؛33?¹Œ%َ9أُخ9=\n',
'‹؛ق]ع:ڈآ/؛0–ق¹¤pُ¹Dؤخ:¤p¤؛\x1bèت9\u200ebé¹ùE‹:–üb7=ٹ؛:؟Xv؛×£c؛ِ(·؛è4\xa0؛cة‹؛0\x16ˆ؛ئ’U:""#؛ة/j:R8،:أُى9ذé€:ىQX:\x1f…L:""›؛K\u200f•؛ڈآں؛‰ˆ8¸ww´:""o؛è´…؛\n',
'W·؛¤pگ:{”¶؛\x0etJ¹\u200eb>:ùإة؛`¬أ؛ِ(ü9K\u200f™:‚N؛:لz;:ِ(ٹ:Œ¥ˆ؛§\n',
'ں؛ِ¨\xad:ڑ™q؛\u200c6\x19:×£H9¤p\x1c:\x03\u200cخ¹–üٹ8UU\x13؛Hلؤ¹è´ء؛ïnژ؛®Gک:è´¯9\x0etN؛O\x1b\x0b؛\x00\x00Z:\n',
'Wڑ؛""J؛؟طخ:\x03\u200c¹:لْ¬؛\u200c6ک9ڑ™D؛\x1bèT8ق]ƒ:¼»س:0–-:~±³:,y‰:è´،¸jƒأ:m\xa0]:A\'د:j\x03\x15؛Haد:""½:wwù¹ه\x17ء؛×#س:&؟œ9×£5؛Hلz¹\\ڈ€¹)\\¨؛O\x1bْ¹ه\x17\x1b¹ڈB×؛\x03\u200c™؛ىQز¹لz¤¹ذi\x1c:\\ڈژ9ùإV¹R¸€:ùإü9ww?9‰\x08\u200d:~±ؤ¹‚Nù¹‰ˆ\x10¹UUn؛\x11\x11ƒ؛ٍ‹چ8‰ˆ½:\x1bèî¹O\x1bè¶`¬´؛=\n',
'¢:\n',
I've also tried using pickle but that also doesn't work.
Any suggestions on what I could try out?

What is a better set of settings to use than text to compare xml files using Collaborator's DiffMerge?

Collaborator uses DiffMerge to compare files. It provides a means to add rulesets. There is nothing provided for XML files. I'd like to be able to compare without including the comments. I can get sections on the same line to behave with \<!--.*--\>
Multiline comments are not working.

Better, but not close to perfect. XML really needs ...
In any case, creating a Custom Context for the multi-line comments does exclude those comments from the testing of "this changed".
Ruleset: XML Files
Suffixes: xml runsettings config
Line Match Handling: [0x00000010]
Ignore/Strip EOLs: true
Ignore/Fold Case: true
Strip Whitespace: true
Also Treat TABs as Whitespace: true
Default Context Guidelines: [0x0000001a]
Classify Differences as Important: true
EOL differences are important: N/A
Case differences are important: true
Whitespace differences are important: false
Treat TABs as Whitespace: true
Custom Contexts: [1 contexts]
Context[0]: Comment: \<!-- to --\> (Escape character \)
Guidelines: [0x0000001b]
Classify Differences as Important: false
EOL differences are important: N/A
Case differences are important: N/A
Whitespace differences are important: N/A
Treat TABs as Whitespace: N/A
Character Encoding:
Automatically detect Unicode BOM: true
Fallback Handling: Use System Local/Default
Lines To Omit: [3 patterns]
LOmit[0]: Each Line Matching: ^[[:blank:]]*$
LOmit[1]: Each Line Matching: \f
LOmit[2]: Each Line Matching: \<!--.*--\>
The important part is the context start \<!--, end --\>, escape character \
and to realize that the ignored content does not get grayed out.

Delphi XE6, Load rtf from Database

How can I convert a string to RTF format read from the database?
When read, in the richedit appears the string with tag : /par {ansistring.......
I tried using this code but the result is the same.
rtfString:= set1.fieldbyname('corpo_rtf').asansistring;
stream := TMemoryStream.Create;
stream.Clear;
stream.Write(PAnsiChar(rtfString)^, Length(rtfString));
stream.Position := 0;
corpo.PlainText := False;
corpo.Lines.LoadFromStream(stream);
stream.Free;

Your code works fine if the content of the memory stream is valid RTF. Ergo, that cannot be the case.
You need to dig deeper into the actual content of the memory stream. Write it out to a text file with .rtf extension. Try to load it with Wordpad. See what happens. You should see the same as your Delphi application displays.
So, where could this be going wrong? Some possible causes include:
The data in the database is not valid RTF.
There are some undesired text conversions being performed. We assume that rtfString is of type AnsiString. Is it?

Open linked data_a data set

I downloaded a data set which is supposed to be in RDF format http://iw.rpi.edu/wiki/Dataset_1329, using Notepad++ I opened it but can't read it. Any suggestions?

The file, uncompressed, is about 140MB. Notepad++ is probably failing due to the size of the file. The RDF format used in this dataset is Ntriples, one triple per line with three components (subject, predicate, object), very human readable. Sample data from the file:
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/race_other_multi_racial> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/race_black_and_white> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/national_origin_hispanic> "0" .
<http://data-gov.tw.rpi.edu/raw/1329/data-1329-00017.rdf#entry8389> <http://data-gov.tw.rpi.edu/vocab/p/1329/filed_cases> "1" .
If you want to have a look at the data then try to open it with a tool that streams the file rather than loading it all at once, for instance less or head.
If you want to use the data you might want to look into loading it in a triple store (4store, Virtuoso, Jena TDB, ...) and use SPARQL to query it.

Try Google Refine (possibly with RDF extension: http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/ )

Categories

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Specify alternative line_end character for Elixir File.stream! function - file

Elixir handles the differences between Windows and Unix just fine by always normalizing "\r\n" into "\n", so developers don't need to worry about both formats. That's what is happening in the example above and that's what you should expect from the operations in both IO and File module.

You could open the file in raw mode (see here) and check the characters yourself.

Related

Trying to open a .ks-ipc file but the file type doesn't seem to exist and Word opens a string of random characters. Does anyone know what this is?

Loading pre-trained CBOW/skip-gram embeddings from a file that has unknown encoding?

What is a better set of settings to use than text to compare xml files using Collaborator's DiffMerge?

Delphi XE6, Load rtf from Database

Open linked data_a data set

Categories

Resources