I'm trying to execute this simple job in Apache Flink.
public class StreamingJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties inputProperties = new Properties();
ObjectMapper mapper = new ObjectMapper();
DataStream<String> eventStream = env
.addSource(new FileSourceFunction("/path/to/file"));
DataStream<ObjectNode> eventStreamObject = eventStream
.map(x -> mapper.readValue(x, ObjectNode.class));
DataStream<ObjectNode> eventStreamWithTime = eventStreamObject
.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<ObjectNode>() {
#Override
public long extractAscendingTimestamp(ObjectNode element) {
String data = element.get("ts").asText();
if(data.endsWith("Z")) {
data = data.substring(0, data.length() -1);
}
return LocalDateTime.parse(data).toEpochSecond(ZoneOffset.UTC);
}
});
eventStreamObject.print();
env.execute("Local job");
}
}
FileSourceFunction is a custom SourceFunction
public class FileSourceFunction implements SourceFunction<String> {
/**
*
*/
private static final long serialVersionUID = 1L;
private String fileName;
private volatile boolean isRunning = true;
public FileSourceFunction(String fileName) {
this.fileName = fileName;
}
#Override
public void run(SourceContext<String> ctx) throws Exception {
// TODO Auto-generated method stub
try (BufferedReader br = new BufferedReader(new FileReader(fileName))) {
try (Stream<String> stream = br.lines()) {
Iterator<String> it = stream.iterator();
while (isRunning && it.hasNext()) {
synchronized (ctx.getCheckpointLock()) {
ctx.collect(it.next());
}
}
}
}
}
#Override
public void cancel() {
isRunning = false;
}
}
When I run the job it throws an StackOverFlowError. I'm using apache Flink 1.8.1.
Related
I'm working with Apache Flink and using the machanism ConnectedStreams. Here is my code:
public class StreamingJob {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> control = env.fromElements("DROP", "IGNORE");
DataStream<String> streamOfWords = env.fromElements("Apache", "DROP", "Flink", "IGNORE");
control
.connect(datastreamOfWords)
.flatMap(new ControlFunction())
.print();
env.execute();
}
public static class ControlFunction extends RichCoFlatMapFunction<String, String, String> {
private boolean found;
#Override
public void open(Configuration config) {
this.found = false;
}
#Override
public void flatMap1(String control_value, Collector<String> out) throws Exception {
if (control_value.equals("DROP")) {
this.found = true;
} else {
this.found = false;
}
}
#Override
public void flatMap2(String data_value, Collector<String> out) throws Exception {
if (this.found) {
out.collect(data_value);
this.found = false;
} else {
// nothing to do
}
}
}
}
As you see, I used a boolean variable to control the process of stream. The boolean variable found is read and written in flatMap1 and in flatMap2. So I'm thinking if I need to worry about the thread-safe issue.
Can the ConnectedStreams ensure thread safe? If not, does it mean that I need to lock the variable found in flatMap1 and in flatMap2?
The calls to flatMap1() and flatMap2() are guaranteed to not overlap, so you don't need to worry about concurrent access to your class's variables.
I was trying to get specific data from database but every time I'm getting the following error!
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to com.lglsys.entity.TDasProductDownload
So this is my QueryService class
#Dependent
public class QueryService {
List<TDasProductDownload> downloadLink = new ArrayList();
final private Logger logger =
LogManager.getLogger(QueryService.class.getName());
#PersistenceContext(unitName="DownloadServices")
EntityManager em;
public QueryService() { super(); }
public List<TDasProductDownload> findAllDownloadLinks() {
try {
downloadLink=
em.createQuery(queryForDownloadLinks,TDasProductDownload.class)
.getResultList();
return downloadLink;
} catch (Exception e) {
logger.info(e.toString());
return null;
}
}
}
program gives error in this class /
EndPoint class
public class PreControlWSEndPoint {
private Session session;
final private Logger logger = LogManager.getLogger(PreControlWSEndPoint.class.getName());
List<TDasProductDownload> downloadLink = new ArrayList();
#PersistenceContext(unitName="DownloadServices")
EntityManager em;
#Inject
QueryService service;
#OnOpen
public void Open(Session session) throws IOException, InterruptedException {
this.session = session;
this.sendMessage("Connection Oppened");
logger.info("EndPoint Opened");
try {
downloadLink = service.findAllDownloadLinks();
logger.info(downloadLink.size());
TDasProductDownload str = downloadLink.get(0);
logger.info(str.getDownloadStatus()); //**Eror line!!**
} catch (Exception e) {
logger.info(e.toString() + " .D");
}
}
#OnMessage
public void onMessage(String message) {}
#OnClose
public void Close() {}
}
I can't see what's happening in my code.
I fixed it!
public List<String> findAllDownloadLinks() {
try {
downloadLink=
em.createQuery(queryForDownloadLinks,String.class)
.getResultList();
return downloadLink;
} catch (Exception e) {
logger.info(e.toString());
return null;
}
}
then i can print like so
for(int temp=0;temp<=downloadLink.size();temp++){
logger.info(downloadLink.get(temp));
}
The following code sample not work in 1.3
public class TumblingWindow {
public static void main(String[] args) throws Exception {
List<Content> data = new ArrayList<Content>();
data.add(new Content(1L, "Hi"));
data.add(new Content(2L, "Hallo"));
data.add(new Content(3L, "Hello"));
data.add(new Content(4L, "Hello"));
data.add(new Content(7L, "Hello"));
data.add(new Content(8L, "Hello world"));
data.add(new Content(16L, "Hello world"));
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
final StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
DataStream<Content> stream = env.fromCollection(data);
DataStream<Content> stream2 = stream.assignTimestampsAndWatermarks(
new BoundedOutOfOrdernessTimestampExtractor<Content>(Time.milliseconds(1)) {
/**
*
*/
private static final long serialVersionUID = 410512296011057717L;
#Override
public long extractTimestamp(Content element) {
return element.getRecordTime();
}
});
Table table = tableEnv.fromDataStream(stream2,
"urlKey,httpGetMessageCount,httpPostMessageCount" + ",uplink,downlink,statusCode,statusCodeCount,rowtime.rowtime");
table.window(Tumble.over("1.hours").on("rowtime").as("w")).groupBy("w, urlKey")
.select("w.start,urlKey,uplink.sum,downlink.sum,httpGetMessageCount.sum,httpPostMessageCount.sum ");
env.execute();
}
public static class Content implements Serializable {
private String urlKey;
private long recordTime;
// private String recordTimeStr;
private long httpGetMessageCount;
private long httpPostMessageCount;
private long uplink;
private long downlink;
private long statusCode;
private long statusCodeCount;
public Content() {
super();
}
public Content(long recordTime, String urlKey) {
super();
this.recordTime = recordTime;
this.urlKey = urlKey;
}
public String getUrlKey() {
return urlKey;
}
public void setUrlKey(String urlKey) {
this.urlKey = urlKey;
}
public long getRecordTime() {
return recordTime;
}
public void setRecordTime(long recordTime) {
this.recordTime = recordTime;
}
public long getHttpGetMessageCount() {
return httpGetMessageCount;
}
public void setHttpGetMessageCount(long httpGetMessageCount) {
this.httpGetMessageCount = httpGetMessageCount;
}
public long getHttpPostMessageCount() {
return httpPostMessageCount;
}
public void setHttpPostMessageCount(long httpPostMessageCount) {
this.httpPostMessageCount = httpPostMessageCount;
}
public long getUplink() {
return uplink;
}
public void setUplink(long uplink) {
this.uplink = uplink;
}
public long getDownlink() {
return downlink;
}
public void setDownlink(long downlink) {
this.downlink = downlink;
}
public long getStatusCode() {
return statusCode;
}
public void setStatusCode(long statusCode) {
this.statusCode = statusCode;
}
public long getStatusCodeCount() {
return statusCodeCount;
}
public void setStatusCodeCount(long statusCodeCount) {
this.statusCodeCount = statusCodeCount;
}
}
private class TimestampWithEqualWatermark implements AssignerWithPunctuatedWatermarks<Object[]> {
/**
*
*/
private static final long serialVersionUID = 1L;
#Override
public long extractTimestamp(Object[] element, long previousElementTimestamp) {
// TODO Auto-generated method stub
return (long) element[0];
}
#Override
public Watermark checkAndGetNextWatermark(Object[] lastElement, long extractedTimestamp) {
return new Watermark(extractedTimestamp);
}
}
}
will raise following exception
Exception in thread "main" org.apache.flink.table.api.TableException: The rowtime attribute can only be replace a field with a valid time type, such as Timestamp or Long.
at org.apache.flink.table.api.StreamTableEnvironment$$anonfun$validateAndExtractTimeAttributes$1.apply(StreamTableEnvironment.scala:450)
at org.apache.flink.table.api.StreamTableEnvironment$$anonfun$validateAndExtractTimeAttributes$1.apply(StreamTableEnvironment.scala:440)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.flink.table.api.StreamTableEnvironment.validateAndExtractTimeAttributes(StreamTableEnvironment.scala:440)
at org.apache.flink.table.api.StreamTableEnvironment.registerDataStreamInternal(StreamTableEnvironment.scala:401)
at org.apache.flink.table.api.java.StreamTableEnvironment.fromDataStream(StreamTableEnvironment.scala:88)
at com.taiwanmobile.cep.noc.TumblingWindow.main(TumblingWindow.java:53)
But if I delete statusCodeCount in fromDataStream, this sample runs successfully without Exception.
Table table = tableEnv.fromDataStream(stream2,
"urlKey,httpGetMessageCount,httpPostMessageCount" + ",uplink,downlink,statusCode,statusCodeCount,rowtime.rowtime");
table.window(Tumble.over("1.hours").on("rowtime").as("w")).groupBy("w, urlKey")
.select("w.start,urlKey,uplink.sum,downlink.sum,httpGetMessageCount.sum,httpPostMessageCount.sum ");
Any suggestion?
This is bug that is filed as FLINK-6881. As a workaround you could define your own StreamTableSource that implements DefinedRowtimeAttribute (see also this documentation draft). A table source also nicely hides the underlying DataStream API which makes table programs more compact.
I am trying to access GAE Memcache and Datastore APIs from Dataflow.
I have followed How to use memcache in dataflow? and setup Remote API https://cloud.google.com/appengine/docs/java/tools/remoteapi
In my pipeline I have written
public static void main(String[] args) throws IOException {
RemoteApiOptions remApiOpts = new RemoteApiOptions()
.server("xxx.appspot.com", 443)
.useApplicationDefaultCredential();
RemoteApiInstaller installer = new RemoteApiInstaller();
installer.install(remApiOpts);
try {
DatastoreConfigManager2.registerConfig("myconfig");
final String topic = DatastoreConfigManager2.getString("pubsub.topic");
final String stagingDir = DatastoreConfigManager2.getString("dataflow.staging");
...
bqRows.apply(BigQueryIO.Write
.named("Insert row")
.to(new SerializableFunction<BoundedWindow, String>() {
#Override
public String apply(BoundedWindow window) {
// The cast below is safe because CalendarWindows.days(1) produces IntervalWindows.
IntervalWindow day = (IntervalWindow) window;
String dataset = DatastoreConfigManager2.getString("dataflow.bigquery.dataset");
String tablePrefix = DatastoreConfigManager2.getString("dataflow.bigquery.tablenametemplate");
String dayString = DateTimeFormat.forPattern("yyyyMMdd")
.print(day.start());
String tableName = dataset + "." + tablePrefix + dayString;
LOG.info("Writing to BigQuery " + tableName);
return tableName;
}
})
where DatastoreConfigManager2 is
public class DatastoreConfigManager2 {
private static final DatastoreService DATASTORE = DatastoreServiceFactory.getDatastoreService();
private static final MemcacheService MEMCACHE = MemcacheServiceFactory.getMemcacheService();
static {
MEMCACHE.setErrorHandler(ErrorHandlers.getConsistentLogAndContinue(Level.INFO));
}
private static Set<String> configs = Sets.newConcurrentHashSet();
public static void registerConfig(String name) {
configs.add(name);
}
private static class DatastoreCallbacks {
// https://cloud.google.com/appengine/docs/java/datastore/callbacks
#PostPut
public void updateCacheOnPut(PutContext context) {
Entity entity = context.getCurrentElement();
if (configs.contains(entity.getKind())) {
String id = (String) entity.getProperty("id");
String value = (String) entity.getProperty("value");
MEMCACHE.put(id, value);
}
}
}
private static String lookup(String id) {
String value = (String) MEMCACHE.get(id);
if (value != null) return value;
else {
for (String config : configs) {
try {
PreparedQuery pq = DATASTORE.prepare(new Query(config)
.setFilter(new FilterPredicate("id", FilterOperator.EQUAL, id)));
for (Entity entity : pq.asIterable()) {
value = (String) entity.getProperty("value"); // use last
}
if (value != null) MEMCACHE.put(id, value);
} catch (Exception e) {
e.printStackTrace();
}
}
}
return value;
}
public static String getString(String id) {
return lookup(id);
}
}
When my pipeline runs on Dataflow I get the exception
Caused by: java.lang.NullPointerException
at com.google.appengine.api.NamespaceManager.get(NamespaceManager.java:101)
at com.google.appengine.api.memcache.BaseMemcacheServiceImpl.getEffectiveNamespace(BaseMemcacheServiceImpl.java:65)
at com.google.appengine.api.memcache.AsyncMemcacheServiceImpl.doGet(AsyncMemcacheServiceImpl.java:401)
at com.google.appengine.api.memcache.AsyncMemcacheServiceImpl.get(AsyncMemcacheServiceImpl.java:412)
at com.google.appengine.api.memcache.MemcacheServiceImpl.get(MemcacheServiceImpl.java:49)
at my.training.google.common.config.DatastoreConfigManager2.lookup(DatastoreConfigManager2.java:80)
at my.training.google.common.config.DatastoreConfigManager2.getString(DatastoreConfigManager2.java:117)
at my.training.google.mss.pipeline.InsertIntoBqWithCalendarWindow$1.apply(InsertIntoBqWithCalendarWindow.java:101)
at my.training.google.mss.pipeline.InsertIntoBqWithCalendarWindow$1.apply(InsertIntoBqWithCalendarWindow.java:95)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound$TranslateTableSpecFunction.apply(BigQueryIO.java:1496)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$Write$Bound$TranslateTableSpecFunction.apply(BigQueryIO.java:1486)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$TagWithUniqueIdsAndTable.tableSpecFromWindow(BigQueryIO.java:2641)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$TagWithUniqueIdsAndTable.processElement(BigQueryIO.java:2618)
Any suggestions? Thanks in advance.
EDIT: my functional requirement is building a pipeline with some configurable steps based on datastore entries.
I made an android dictionary application. I have created a database named "kamusJawa.sqlite" and copied it to the assets folder. I tried the code in this link Own Database in Assets Folder on Android Eclipse Project
This is my database manager class:
package com.kamusJI;
public class DBHelper extends SQLiteOpenHelper{
private static String DBPATH = "/data/data/com.kamusJI/databases/";
private static String DBNAME = "kamusJawa.sqlite";
private SQLiteDatabase DBSQ;
private final Context KJICtx;
public DBHelper(Context context) throws IOException {
super(context, DBNAME, null, 1);
this.KJICtx = context;
// TODO Auto-generated constructor stub
boolean dbexist = cekDB();
if (dbexist) {
//System.out.println("Database exists");
openDB();
} else {
System.out.println("Database doesn't exist");
createDB();
}
}
public void createDB() throws IOException{
boolean dbExist = cekDB();
if(!dbExist){
this.getReadableDatabase();
try{
salinDB();
}catch (IOException e){
throw new Error("Gagal menyalin database");
}
}
}
boolean cekDB() {
//SQLiteDatabase cekDatabase = null;
boolean cekdb = false;
try{
String path = DBPATH + DBNAME;
File dbfile = new File(path);
//cekDatabase = SQLiteDatabase.openDatabase(path, null, SQLiteDatabase.OPEN_READONLY);
cekdb = dbfile.exists();
}catch(SQLException e){
System.out.println("Database tidak ada");
}
return cekdb;
//return cekDatabase !=null ? true : false;
}
private void salinDB() throws IOException{
AssetManager AM = KJICtx.getAssets();
File DbFile = new File(DBPATH+DBNAME);
InputStream in = KJICtx.getAssets().open(DBNAME);
//OutputStream out = new FileOutputStream(DbFile);
OutputStream out = new FileOutputStream("/data/data/com.kamusJI/databases/kamusJawa.sqlite");
DbFile.createNewFile();
byte[] b = new byte[1024];
int i, r;
String[] Files = AM.list("");
Arrays.sort(Files);
i= 1;
String fdb = String.format("kamusJawa.db.00%d", i);
while(Arrays.binarySearch(Files, fdb)>=0){
//InputStream in = AM.open(fdb);
while(( r = in.read(b))>0)
out.write(b,0,r);
in.close();
i++;
fdb = String.format("kamusJawa.db.00%d", i);
}
out.flush();
out.close();
}
public void openDB() throws SQLException{
String path = DBPATH+DBNAME;
DBSQ = SQLiteDatabase.openDatabase(path, null, SQLiteDatabase.OPEN_READONLY);
}
public synchronized void close(){
if(DBSQ !=null)
DBSQ.close();
super.close();
}
#Override
public void onCreate(SQLiteDatabase arg0) {
// TODO Auto-generated method stub
}
#Override
public void onUpgrade(SQLiteDatabase arg0, int arg1, int arg2) {
// TODO Auto-generated method stub
}
}
and this is my main class:
package com.kamusJI;
public class KJI extends ListActivity {
private KJI this_class = this;
String[] Menu = {"Basa Jawa", "Bahasa Indonesia", "Tambah Data"};
/** Called when the activity is first created. */
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main);
setListAdapter(new ArrayAdapter<String>(this, R.layout.row, R.id.Cari, Menu));
ListView lv = getListView();
lv.setTextFilterEnabled(false);
/* Defines On Item Click callback method */
lv.setOnItemClickListener(new OnItemClickListener() {
#Override
public void onItemClick(AdapterView<?> parent, View view, int position,
long id) {
Intent action = null;
switch(position) {
case 0:
case 1:
action = new Intent(getApplicationContext(), Cari.class);
action.putExtra("MODE", position);
break;
case 2:
action = new Intent(getApplicationContext(), Tambah.class);
action.putExtra("MODE", position);
break;
case 3:
finish();
return;
}
startActivity(action);
Toast.makeText(getApplicationContext(), ((TextView) view).getText(), Toast.LENGTH_SHORT).show();
}
});
}
public void InitDatabase() {
AsyncTask<String, Void, String> InitDB = new AsyncTask<String, Void, String>() {
Dialog progress = null;
String msg;
DBHelper dbhelper;
#Override
protected void onPreExecute() {
try {
dbhelper = new DBHelper(this_class);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
if (!dbhelper.cekDB())
progress = ProgressDialog.show(this_class, "", "Installing Database.\nPlease wait.");
super.onPreExecute();
}
#Override
protected String doInBackground(String... params) {
try {
dbhelper.createDB();
msg = "Database successfully installed.";
} catch (IOException ioe) {
msg = "Database installation failed.";
}
return msg;
}
#Override
protected void onPostExecute(String result) {
super.onPostExecute(result);
if (progress!=null) {
progress.dismiss();
Toast.makeText(getApplicationContext(), result, Toast.LENGTH_SHORT).show();
}
}
};
InitDB.execute(new String());
}
}
When I run my application, then I go to file explorer, I can't find the data/data/com.kamusJI/databases. How it can be like that?
change your database name extension to .db
You need special permissions like root access to read the path:
/data/data/com.package/databases