How to define an array in hadoop partitioner - arrays

I am new in hadoop and mapreduce programming and don't know what should i do. I want to define an array of int in hadoop partitioner. i want to feel in this array in main function and use its content in partitioner. I have tried to use IntWritable and array of it but none of them didn't work . I tried to use IntArrayWritable but again it didn't work. I will be pleased if some one help me. Thank you so much
public static IntWritable h = new IntWritable[1];
public static void main(String[] args) throws Exception {
h[0] = new IntWritable(1);
}
public static class CaderPartitioner extends Partitioner <Text,IntWritable> {
#Override
public int getPartition(Text key, IntWritable value, int numReduceTasks) {
return h[0].get();
}
}

if you have limited number of values, you can do in the below way.
set the values on the configuration object like below in main method.
Configuration conf = new Configuration();
conf.setInt("key1", value1);
conf.setInt("key2", value2);
Then implement the Configurable interface for your Partitioner class and get the configuration object, then key/values from it inside your Partitioner
public class testPartitioner extends Partitioner<Text, IntWritable> implements Configurable{
Configuration config = null;
#Override
public int getPartition(Text arg0, IntWritable arg1, int arg2) {
//get your values based on the keys in the partitioner
int value = getConf().getInt("key");
//do stuff on value
return 0;
}
#Override
public Configuration getConf() {
// TODO Auto-generated method stub
return this.config;
}
#Override
public void setConf(Configuration configuration) {
this.config = configuration;
}
}
supporting link
https://cornercases.wordpress.com/2011/05/06/an-example-configurable-partitioner/
note if you have huge number of values in a file then better to find a way to get cache files from job object in Partitioner

Here's a refactored version of the partitioner. The main changes are:
Removed the main() which isnt needed, initialization should be done in the constructor
Removed static from the class and member variables
public class CaderPartitioner extends Partitioner<Text,IntWritable> {
private IntWritable[] h;
public CaderPartitioner() {
h = new IntWritable[1];
h[0] = new IntWritable(1);
}
#Override
public int getPartition(Text key, IntWritable value, int numReduceTasks) {
return h[0].get();
}
}
Notes:
h doesn't need to be a Writable, unless you have additional logic not included in the question.
It isn't clear what the h[] is for, are you going to configure it? In which case the partitioner will probably need to implement Configurable so you can use a Configurable object to set the array up in some way.

Related

Flink streaming example that generates its own data

Earlier I asked about a simple hello world example for Flink. This gave me some good examples!
However I would like to ask for a more ‘streaming’ example where we generate an input value every second. This would ideally be random, but even just the same value each time would be fine.
The objective is to get a stream that ‘moves’ with no/minimal external touch.
Hence my question:
How to show Flink actually streaming data without external dependencies?
I found how to show this with generating data externally and writing to Kafka, or listening to a public source, however I am trying to solve it with minimal dependence (like starting with GenerateFlowFile in Nifi).
Here's an example. This was constructed as an example of how to make your sources and sinks pluggable. The idea being that in development you might use a random source and print the results, for tests you might use a hardwired list of input events and collect the results in a list, and in production you'd use the real sources and sinks.
Here's the job:
/*
* Example showing how to make sources and sinks pluggable in your application code so
* you can inject special test sources and test sinks in your tests.
*/
public class TestableStreamingJob {
private SourceFunction<Long> source;
private SinkFunction<Long> sink;
public TestableStreamingJob(SourceFunction<Long> source, SinkFunction<Long> sink) {
this.source = source;
this.sink = sink;
}
public void execute() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> LongStream =
env.addSource(source)
.returns(TypeInformation.of(Long.class));
LongStream
.map(new IncrementMapFunction())
.addSink(sink);
env.execute();
}
public static void main(String[] args) throws Exception {
TestableStreamingJob job = new TestableStreamingJob(new RandomLongSource(), new PrintSinkFunction<>());
job.execute();
}
// While it's tempting for something this simple, avoid using anonymous classes or lambdas
// for any business logic you might want to unit test.
public class IncrementMapFunction implements MapFunction<Long, Long> {
#Override
public Long map(Long record) throws Exception {
return record + 1 ;
}
}
}
Here's the RandomLongSource:
public class RandomLongSource extends RichParallelSourceFunction<Long> {
private volatile boolean cancelled = false;
private Random random;
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
random = new Random();
}
#Override
public void run(SourceContext<Long> ctx) throws Exception {
while (!cancelled) {
Long nextLong = random.nextLong();
synchronized (ctx.getCheckpointLock()) {
ctx.collect(nextLong);
}
}
}
#Override
public void cancel() {
cancelled = true;
}
}

what's the difference between object and primitive type when using matchers in EasyMock

//service to mock
public interface ServiceToMock {
public void operateDouble(Double dbValue);
public void operateCar(Car car);
}
//class under test
public class ClassUnderTest {
ServiceToMock service;
public void operateDouble(Double dbValue){
service.operateDouble(dbValue);
}
public void operateObject(Car car){
service.operateCar(car);
}
}
//unit test class
#RunWith(EasyMockRunner.class)
public class TestEasyMockMatcherUnderTest {
#TestSubject
private final ClassUnderTest easyMockMatcherUnderTest = new ClassUnderTest();
#Mock
private ServiceToMock mock;
#Test
public void testOperateCar() {
//record
mock.operateCar(EasyMock.anyObject(Car.class));
EasyMock.expectLastCall();
// replay
EasyMock.replay(mock);
//matcher here...
easyMockMatcherUnderTest.operateObject(EasyMock.anyObject(Car.class));
//easyMockMatcherUnderTest.operateObject(new Car());
// verify
EasyMock.verify(mock);
}
#Test
public void testOperateDouble() {
// record
mock.operateDouble(EasyMock.anyDouble());
EasyMock.expectLastCall();
// replay
EasyMock.replay(mock);
easyMockMatcherUnderTest.operateDouble(EasyMock.anyDouble());
// verify
EasyMock.verify(mock);
}
}
As the above code has shown, I intent to test two methods(operateDouble, operateObject). But things are kinda weird since everything runs fine in the operateDouble block while the compiler complaints an "Illegal state exception: 1 matchers expected, 2 recored." when runnig operateObject. And if commentting the method operateDouble out, the compaint goes away..So what is the difference between Double and my custom object Car, as the Double can also be considered as an object. And why does codes in operateObject runs well when commenting operateDouble method out?
EasyMock.anyDouble and EasyMock.anyObject are not meant to be used in replay mode. They are used to setup your expectations in record mode.
Use this in your first test (testOperateCar):
easyMockMatcherUnderTest.operateObject(new Car());
and something like this in your second (testOperateDouble):
easyMockMatcherUnderTest.operateDouble(1.0);
By the way, you don't need to call EasyMock.expectLastCall. It is only useful if you expect a void method to be called multiple times, for example:
mock.operateCar(EasyMock.anyObject(Car.class));
EasyMock.expectLastCall().times(3);

Autofixture, expected behavior?

Having a test similar to this:
public class myClass
{
public int speed100index = 0;
private List<int> values = new List<int> { 200 };
public int Speed100
{
get
{
return values[speed100index];
}
}
}
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestMethod1()
{
var fixture = new Fixture();
var sut = fixture.Create<myClass>();
Assert.AreEqual(sut.Speed100, 200);
}
}
Would have expected this to work, but I can see why it's not. But how do I argue, that this is not a problem with AutoFixture, but a problem with the code?
AutoFixture is giving you feedback about the design of your class. The feedback is, you should follow a more object-oriented design for this class.
Protect your private state, to prevent your class from entering an inconsistent state.
You need to make the speed100index field, private, to ensure it remains consistent with the values List.
Here is what I see if I run debugger on your test:
Autofixture assigns a random number to speed100index field because it is public, and in your array there is nothing at point 53 (from my screenshot)
If you set speed100index to be private, Autofixture will not re-assign the number and your test will pass.

Is it bad to set up singletons such that all methods are static methods?

I frequently setup singleton classes that are intended to be used by other programmers and I find that I'm not sure if there is a preferred way to setup access to methods in those classes. The two ways I've thought to do it are:
public class MyClass {
private static MyClass instance;
public static void DoStuff( ) {
instance.DoStuffInstance( );
}
private void DoStuffInstance( ) {
// Stuff happens here...
}
}
where the usage is: MyClass.DoStuff( );
or something more like this:
public class MyClass {
public static MyClass instance;
public void DoStuff( ) {
// Stuff happens here...
}
}
where the usage is: MyClass.instance.DoStuff( );
Personally, I tend to prefer the first option. I find that having MyClass.instance all over the place is both ugly and unintuitive to remember for less experienced programmers.
Is there any good reason to prefer one of these over the other? Opinions are fine. Just curious what others think.
I've never seen a Singleton implemented this way. A typical setup might be something like this:
public class MyClass {
private static final MyClass instance = null;
// Private to ensure that no other instances can be allocated.
private MyClass() {}
// Not thread safe!
public static MyClass getInstance() {
if( instance == null ) {
instance = new MyClass();
}
return instance;
}
public void DoStuff( ) {
// Stuff happens here...
}
}
This way all of the calls will be similar to instance.DoStuff(); and you will only need to define one method per "operation", rather than needing a static method and then the actual "instance" method that your first approach uses.
Also, the way you have it set up, it looks like you can call those static methods before the instance is actually initialized, which is a problem.

saving variables wp7

Whats the best way to save variables like userid that is stored and reachable from different pages in WP7.
There's the querystring method, but can be kind of a pain to implement.
When navigating, pass the parameter like a HTTP querystring.
Then, on the otherside, check if the key exists, and extract the value. The downside of this is if you need to do more than 1, you need to type it in yourself, and it only supports strings.
So to pass an integer, you'd need to convert it. (And to pass a complex object, you need to take all the pieces you need to recompile it on the other side)
NavigationService.Navigate(new Uri("/PanoramaPage1.xaml?selected=item2", UriKind.Relative));
protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)
{
string selected = String.Empty;
//check to see if the selected parameter was passed.
if (NavigationContext.QueryString.ContainsKey("selected"))
{
//get the selected parameter off the query string from MainPage.
selected = NavigationContext.QueryString["selected"];
}
//did the querystring indicate we should go to item2 instead of item1?
if (selected == "item2")
{
//item2 is the second item, but 0 indexed.
myPanorama.DefaultItem = myPanorama.Items[1];
}
base.OnNavigatedTo(e);
}
Here's a sample app that uses a querystring.
http://dl.dropbox.com/u/129101/Panorama_querystring.zip
A easier (and better) idea is to define a variable globally, or use a static class. In App.xaml.cs, define
using System.Collections.Generic;
public static Dictionary<string,object> PageContext = new Dictionary<string,object>;
Then, on the first page, simply do
MyComplexObject obj;
int four = 4;
...
App.PageContext.Add("mycomplexobj",obj);
App.PageContext.Add("four",four);
Then, on the new page, simply do
MyComplexObj obj = App.PageContext["mycomplexobj"] as MyComplexObj;
int four = (int)App.PageContext["four"];
To be safe, you should probably check if the object exists:
if (App.PageContext.ContainsKey("four"))
int four = (int)App.PageContext["four"];
You may use an App level variable (defined in App.xaml.cs) and access it from anywhere within your app. If you want to persist, shove it into Isolated Storage and read it on App launch/activate. There are helpers available to JSon serialize/deserialize your reads/writes from the Isolated Storage.
Check out Jeff's post (here) on tips to use Isolated Storage.
Hope this helps!
Well "best" is always subjective, however, I think an application service is a good candidate for this sort of thing:-
public interface IPhoneApplicationService : IApplicationService
{
string Name {get; set;}
object Deactivating();
void Activating(object state);
}
public class AuthenticationService : IPhoneApplicationService
{
public static AuthenticationService Current {get; private set; }
public void StartService(ApplicationServiceContext context)
{
Current = this;
}
public void StopService()
{
Current = null;
}
public string Name {get; set;}
public object Deactivating()
{
// Return an serialisable object such as a Dictionary if necessary.
return UserID;
}
public void Activating(object state)
{
UserID = (int)state;
}
public int UserID { get; private set; }
public void Logon(string username, string password)
{
// Code here that eventually assigns to UserID.
}
}
You place an instance of this in your App.xaml:-
<Application.ApplicationLifetimeObjects>
<!--Required object that handles lifetime events for the application-->
<shell:PhoneApplicationService
Launching="Application_Launching" Closing="Application_Closing"
Activated="Application_Activated" Deactivated="Application_Deactivated"/>
<local:AuthenticationService Name="AuthServ" />
</Application.ApplicationLifetimeObjects>
Now you do need to tweak the App.xaml.cs:-
private void Application_Activated(object sender, ActivatedEventArgs e)
{
var state = PhoneApplicationService.Current.State;
foreach (var service in ApplicationLifetimeObjects.OfType<IPhoneApplicationService>())
{
if (state.ContainsKey(service.Name))
{
service.Activating(state[service.Name]);
}
}
}
private void Application_Deactivated(object sender, DeactivatedEventArgs e)
{
var state = PhoneApplicationService.Current.State;
foreach (var service in ApplicationLifetimeObjects.OfType<IPhoneApplicationService>())
{
if (state.ContainsKey(service.Name))
{
state[service.Name] = service.Deactivating();
}
else
{
state.Add(service.Name, service.Deactivating());
}
}
}
You can now access you UserID anywhere in your app with:-
AuthenticationService.Current.UserID
This general pattern can be used to maintain seperation of key application wide services (you don't load a whole bunch of incohesive properties into your App class). It also provides the hooks for maintaining state between activations which is essential.

Resources