How can I use other environments - artificial-intelligence

OpenAI's universe is a awesome library, Since the code
# coding: utf-8
import gym
import universe # register the universe environments
env = gym.make('flashgames.DuskDrive-v0')
env.configure(remotes=1) # automatically creates a local docker container
observation_n = env.reset()
while True:
action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n] # your agent here
observation_n, reward_n, done_n, info = env.step(action_n)
env.render()
Has provided the "DuskDrive-v0" environment, and How can I use other environments?

Just go to this page:
https://universe.openai.com/envs#flash_games
Select a game, copy the title of the game:
flashgames.HarvestDay-v0
and replace it in your code:
env = gym.make('flashgames.DuskDrive-v0')

Related

Using SourceFunction and SinkFunction in PyFlink

I am new to PyFlink. I have done the official training exercise in Java: https://github.com/apache/flink-training
However, the project I am working on must use Python as a programming language. I want to know if it is possible to write a data generator using the "SourceFunction". In older PyFlink versions this was possible, using Jython: https://nightlies.apache.org/flink/flink-docs-release-1.7/dev/stream/python.html#streaming-program-example
In newer examples the dataframe contains a finite set of data, which is never extended. I have not found any example of a data generator in PyFlink, e.g. https://github.com/apache/flink-training/blob/master/common/src/main/java/org/apache/flink/training/exercises/common/sources/TaxiRideGenerator.java
I am not sure which functionality the interfaces Source and SinkFunction provide. Can it be used somehow in python or can it only be used in combination with other pipelines or jar files? It looks like the methods "run()" and "cancel()" are not implemented and thus it cannot be used like some other classes, by overloading.
If it can not be used in Python, are there any other ways to use it? Someone may provide an easy example.
If it is not possible to use it, are there any other ways to write a data generator in OOP style? Take this example: https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/python/datastream_tutorial/ There the split() method is used to separate the stream. Basically, I want to do this by an extra class and just extending the stream, which was done in the Java TaxiRide example via "ctx.collect()". I am trying to avoid using Java, another framework for the pipeline, and Jython. It would be nice to get a short example code, but I appreciate any tips and advice.
I tried to use SourceFunction directly, but as already mentioned, I think this is a completely wrong way, resulting in an error: AttributeError: 'DataGenerator' object has no attribute '_get_object_id'
class DataGenerator(SourceFunction):
def __init__(self):
super().__init__(self)
self._num_iters = 1000
self._running = True
def run(self, ctx):
counter = 0
while self._running and counter < self._num_iters:
ctx.collect('Hello World')
counter += 1
def cancel(self):
self._running = False
Solution:
After looking in some older code using the classes Source and SinkFunction, I came to a solution. Here a kafka connector written in Java is used. The python code can be taken as an example of how to use pyflink's Source and SinkFuntion.
I have only written an example for the SourceFunction:
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream import SourceFunction
from pyflink.java_gateway import get_gateway
class TaxiRideGenerator(SourceFunction):
def __init__(self):
java_src_class = get_gateway().jvm.org.apache.flink.training.exercises.common.sources.TaxiRideGenerator
java_src_obj = java_src_class()
super(TaxiRideGenerator, self).__init__(java_src_obj)
def show(ds, env):
# this is just a little helper to show the output of the pipeline
ds.print()
env.execute()
def streaming():
# arm the flink ExecutionEnvironment
env = StreamExecutionEnvironment.get_execution_environment()
env.set_parallelism(1)
taxi_src = TaxiRideGenerator()
ds = env.add_source(taxi_src)
show(ds, env)
if __name__ == "__main__":
streaming()
The second line in the class init was hard to find. I had expected to get an object in the first line.
You have to create a jar file after building this project.
I have entered the path until I see the folder "org":
$ cd flink-training/flink-training/common/build/classes/java/main
flink-training/common/build/classes/java/main$ ls
flink-training/common/build/classes/java/main$ org
flink-training/common/build/classes/java/main$ jar cvf flink-training.jar org/apache/flink/training/exercises/common/**/*.class
Copy the jar file to the pyflink/lib folder, normally under your python environment, e.g. flinkenv/lib/python3.8/site-packages/pyflink/lib. Then start the script.

How do you manage DSNs on Windows with Python 3?

I prefer to connect to databases with DSNs. I'm not a fan of putting user names and passwords in code or config files nor do I appreciate the trusted connection approach.
When I google how to MANAGE DSNs with Python, I wind up with some variation of the below.
import ctypes
ODBC_ADD_DSN = 1 # Add data source
ODBC_CONFIG_DSN = 2 # Configure (edit) data source
ODBC_REMOVE_DSN = 3 # Remove data source
ODBC_ADD_SYS_DSN = 4 # add a system DSN
ODBC_CONFIG_SYS_DSN = 5 # Configure a system DSN
ODBC_REMOVE_SYS_DSN = 6 # remove a system DSN
def create_sys_dsn(driver, **kw):
"""Create a system DSN
Parameters:
driver - ODBC driver name
kw - Driver attributes
Returns:
0 - DSN not created
1 - DSN created
"""
nul = chr(32)
attributes = []
for attr in kw.keys():
attributes.append("%s=%s" % (attr, kw[attr]))
if (ctypes.windll.ODBCCP32.SQLConfigDataSource(0, ODBC_ADD_SYS_DSN, driver, nul.join(attributes))):
return True
else:
print(ctypes.windll.ODBCCP32.SQLInstallerError)
return False
if __name__ == "__main__":
if create_sys_dsn("SQL Server",SERVER="server name", DESCRIPTION="SQL Server DSN", DSN="SQL SERVER DSN", Database="ODS", Trusted_Connection="Yes"):
print ("DSN created")
else:
print ("DSN not created")
When you run this, I wind up with this as output:
<_FuncPtr object at 0x0000028E274C8930>
I have two issues.
I'm not used to working with the OS through an API and I can't find much documentation or usage examples that aren't in C++. That said, the code runs, it just never returns true.
I can't figure out how to get it to kick out error information so I can diagnose the problem. This code leaves no footprint in the logs so nothing shows up in event viewer as something being wrong.
How can I troubleshoot this? Am I even doing the right thing by taking this route? What exactly IS best practice for connecting to databases with Python?
There are actually TWO answers to this question.
You can do it by calling PowerShell scripts from Python.
You don't. You use trusted connections in your connection string.
I went with option 2.

How does flink recognize hiveConfDir when running in yarn cluster

I have following code to test flink and hive integration. I submit the application via flink run -m yarn-cluster ..... The hiveConfDir is a local directory that resides on the machine that I submit the application, I would ask how flink can able to read this local directory when the main class is running in the cluster(yarn-cluster)? Thanks!
package org.example.app
import org.apache.flink.streaming.api.scala._
import org.apache.flink.table.api.bridge.scala._
import org.apache.flink.table.catalog.hive.HiveCatalog
import org.apache.flink.types.Row
object FlinkBatchHiveTableIntegrationTest {
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setParallelism(1)
val tenv = StreamTableEnvironment.create(env)
val name = "myHiveCatalog"
val defaultDatabase = "default"
//how does flink could read this local directory
val hiveConfDir = "/apache-hive-2.3.7-bin/conf"
val hive = new HiveCatalog(name, defaultDatabase, hiveConfDir)
tenv.registerCatalog(name, hive)
tenv.useCatalog(name)
val sql =
"""
select * from testdb.t1
""".stripMargin(' ')
val table = tenv.sqlQuery(sql)
table.printSchema()
table.toAppendStream[Row].print()
env.execute("FlinkHiveIntegrationTest")
}
}
Looks I find the answer. The application is submitted with flink run -m yarn-cluster.By this way, the main method of the application is running at the client side where the hive is installed,so the hive conf dir could be read.

Flink: How to set System Properties in TaskManager?

I have some code reading message from kafka like below:
def main(args: Array[String]): Unit = {
System.setProperty("java.security.auth.login.config", "someValue")
val env = StreamExecutionEnvironment.getExecutionEnvironment
val consumerProperties = new Properties()
consumerProperties.setProperty("security.protocol", "SASL_PLAINTEXT")
consumerProperties.setProperty("sasl.mechanism", "PLAIN")
val kafkaConsumer = new FlinkKafkaConsumer011[ObjectNode](consumerProperties.getProperty("topic"), new JsonNodeDeserializationSchema, consumerProperties)
val stream = env.addSource(kafkaConsumer)
}
When the source try to read message from Apache Kafka, org.apache.kafka.common.security.JaasContext.defaultContext function will load "java.security.auth.login.config" property.
But the property is only set in JobManager, and when my job get running, the property can not load correctly in TaskManager, so the source will fail.
I tried to set extra JVM_OPTS like "-Dxxx=yyy", but the flink cluster is deployed in standalone mode, environment variable can not be changed very often.
Is there any way to set property in TaskManager?
The file bin/config.sh of Flink standalone cluster holds a property named DEFAULT_ENV_JAVA_OPTS.
Also if you export $JVM_ARGS="your parameters" the file bin/config.sh will load it using these lines:
# Arguments for the JVM. Used for job and task manager JVMs.
# DO NOT USE FOR MEMORY SETTINGS! Use conf/flink-conf.yaml with keys
# KEY_JOBM_MEM_SIZE and KEY_TASKM_MEM_SIZE for that!
if [ -z "${JVM_ARGS}" ]; then
JVM_ARGS=""
fi

Configure Slick with Sql Server

I have a project that is currently using MySQL that I would like to migrate to SQL Server (running on Azure). I have tried a lot of combinations of configurations but always get the same generic error message:
Cannot connect to database [default]
Here is my latest configuration attempt:
slick.dbs.default.driver = "com.typesafe.slick.driver.ms.SQLServerDriver"
slick.dbs.default.db.driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
slick.dbs.default.db.url = "jdbc:sqlserver://my_host.database.windows.net:1433;database=my_db"
slick.dbs.default.db.user = "username"
slick.dbs.default.db.password = "password"
slick.dbs.default.db.connectionTimeout="10 seconds"
I have the sqljdbc4.jar in my lib/ folder.
And have added the following to my build.sbt
libraryDependencies += "com.typesafe.slick" %% "slick-extensions" % "3.0.0"
resolvers += "Typesafe Releases" at "http://repo.typesafe.com/typesafe/maven-releases/"
Edit: I can connect from this machine using a GUI app, so the issue is not with any of the network settings.
Edit: 5/30/2017
After the release of Slick 3.2 the driver is now in the core suite, these are examples of Configs with 3.2
oracle = {
driver = "slick.jdbc.OracleProfile$"
db {
host = ${?ORACLE_HOST}
port = ${?ORACLE_PORT}
sid = ${?ORACLE_SID}
url = "jdbc:oracle:thin:#//"${oracle.db.host}":"${oracle.db.port}"/"${oracle.db.sid}
user = ${?ORACLE_USERNAME}
password = ${?ORACLE_PASSWORD}
}
}
sqlserver = {
driver = "slick.jdbc.SQLServerProfile$"
db {
host = ${?SQLSERVER_HOST}
port = ${?SQLSERVER_PORT}
databaseName = ${?SQLSERVER_DB_NAME}
url = "jdbc:sqlserver://"${sqlserver.db.host}":"${sqlserver.db.port}";databaseName="${sqlserver.db.databaseName}
user = ${?SQLSERVER_USERNAME}
password = ${?SQLSERVER_PASSWORD}
}
}
End Edit
I only have experience with the oracle config but I believe it is fairly similar. You are missing the critical $ at the end of the default driver. Also you will need to make sure your SBT project recognizes the lib
This first code snippet should be in application.conf or whatever file you are using for your Configuration
oracle = {
driver = "com.typesafe.slick.driver.oracle.OracleDriver$"
db {
host = ""
port = ""
sid = ""
url = "jdbc:oracle:thin:#//"${oracle.db.host}":"${oracle.db.port}"/"${oracle.db.sid}
user = ${?USERNAME}
password = ${?PASSWORD}
driver = oracle.jdbc.driver.OracleDriver
}
}
This second section is in my build.sbt . I put my oracle driver in the base folder in the /.lib, although their may be a better way.
unmanagedBase := baseDirectory.value / ".lib"
Finally to make sure the config is loading properly. Slick default seems to misbehave, so hopefully you get a right answer, rather than a what works for me answer. However utilizing my config above I can then load that using the last snippet. I found this in an example of a cake implementation and it has worked very well in multiple projects.
val dbConfig: DatabaseConfig[JdbcProfile] = DatabaseConfig.forConfig("oracle")
implicit val profile: JdbcProfile = dbConfig.driver
implicit val db: JdbcProfile#Backend#Database = dbConfig.db
This allows you to use the database, the driver for imports and will fail on compile if your configuration is wrong. Hope this helps.
edit : I finished and realized you were working with Azure so make sure that you can fully connect utilizing the same settings from the same machine utilizing a client of your choice. To make sure all firewall and user settings are correct and that the problem truly lies in your code and not in your system configuration.
edit2: Wanted to make sure I didn't give you bad advice since it was an Oracle Config so I set it up against and AWS SQL Server. I utilized the sqljdbc42.jar that is given by Microsoft with their jdbc install. Put that in the .lib and then I had a configuration like follows. As in the upper example you could instead use Environmental variables but this was just a quick proof of concept. Here is a Microsoft SQL Server Config I have now tested to confirm works.
sqlserver = {
driver = "com.typesafe.slick.driver.ms.SQLServerDriver$"
db {
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
host = ""
port = ""
databaseName = ""
url = "jdbc:sqlserver://"${sqlserver.db.host}":"${sqlserver.db.port}";databaseName="${sqlserver.db.databaseName}
user = ""
password = ""
}
}

Resources