StreamingFileSink configure file name forRowFormat - apache-flink

I need to configure file names for files created by StreamingFileSink.
I use ParquetAvroWriters.forGenericRecord to create parquet files.
Discovered that i cant use .withOutputFileConfig() when i use .forBulkFormat() (which is available when use .forRowFormat().

It started working with Flink 1.11

Related

Flink table api writes empty parquet

I try to write parquet files with Flink table api but the files that are written empty. I use StreamExecutionTable and write the files to my local filesystem. When I write jsons the files are written correctly.
I have tried enabling checkpointing and adding rolling-policy properties but nothing helped.
The version is 1.14
Thanks :)

How to follow a updating local file while using flink

As mentioned in the document:
For example a data pipeline might monitor a file system directory for new files and write their data into an event log. Another application might materialize an event stream to a database or incrementally build and refine a search index.
So, how can I follow a local file system file updating while using Flink?
Here, the document also mentioned that:
File system sources for streaming is still under development. In the future, the community will add support for common streaming use cases, i.e., partition and directory monitoring.
Does this mean I could use the API to do some special streaming? If you know how to use streaming file system source, please tell me. Thanks!

How to automate an upload process with talend whenever a file is moved into a specific folder

I have zero experience with ETL.
Whenever a file(a .csv) is moved into a specific folder, it should be uploaded to SalesForce I don't know how to get this automated flow.
I hope I was clear enough.
I gotta use the opensource version, any helpful links or resources will be appreciated.
Thank you in advance
You could definitely use Talend Open Studio for ESB : this studio contains 'Routes' functionalities : you'll be able to use a cFile component, which will check your folder for new files, and raise an event that will propagate throughout the route to a designed endpoint (for example a salesForce API). Talend ESB maps Apache Camel components , which are well documented.
Check about Routes with Talend for ESB, it should do the trick.
We have tFileExists component, you can use that and configure to check the file.
Also you have tFileWait component, where you can defile the frame of the arrival of he files and the number of iterations it has to check the file.
But i would suggest id you have any scheduling tool, use file watcher concept and then use talend job to upload the file to a specific location.
Using talend itself to check the file arrival is not a feasible way as the jobs has be in running state continuously which consumes more java resource

solr index java source files as text

I want to upload lots of source files (say, java) to solr to allow indexed search on them.
They should be posted as plain text files.
No special parsing is required.
When trying to upload one java file I get "Unknown Source" related error.
java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory
When I rename the file adding .txt in the end, it is uploaded successfully.
I have thousands of files to upload on a daily basis and need to keep original names.
How do I tell solr to treat all files in the directory as .txt?
Advanced thanks!
For googlers, concerning the Solr error:
java.lang.NoClassDefFoundError: com/uwyn/jhighlight/renderer/XhtmlRendererFactory
You can correct this by adding the jar "jhighlight-1.0.jar" in Solr. To do so:
Download the old solr 4.9. In recent version, jhighlight is not present.
Extract solr-4.9.0\contrib\extraction\lib\jhighlight-1.0.jar
Copy jhighlight-1.0.jar to the solr installation under solr/server/lib/ext/
Restart the server.
You can achieve the same by integrating solr with tika.
Apache will help you to extract the text of the source files.
It has a source code parser which supports c,c++ and Java.
Here is the link which will give you more details.
https://googleweblight.com/?lite_url=https://tika.apache.org/1.12/formats.html&lc=en-IN&s=1&m=972&host=www.google.co.in&ts=1461564865&sig=APY536wBFFAcFH7yUyvhh2TFslPz6LeClA

Send multiple file into solr

I want to sending multiple files to solr using curl.How i can do it ?
I can done with only one file with command for example:
curl
"http://localhost:8983/solr/update/extract?literal.id=paas2&commit=true"
-F "file=#cloud.pdf"
Anyone can help me,
Tks
The api does not support passing multiple files for extraction.
Usually the last file will be the only one thats gets uploaded and added.
You can have individual files indexed as separate entities in Solr.
OR One way to upload multiple files is to zip these files and upload the zip file.
There is one issue with Solr indexing zip files and you can try the SOLR-2332 Patch
i using apache solr 4.0 Beta which have capability to upload multiple file and generate id for each file uploaded using post.jar and It's very helpfull for me.
Let'see on :
http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29
Thanks all :)
my problem have solved :)

Resources