Spark 2.0: Moving from RDD to Dataset - dataset

I want to adapt my Java Spark app (which actually uses RDDs for some calculations) to use Datasets instead of RDDs. I'm new to Datasets and not sure how to map which transaction to a corresponding Dataset operation.
At the moment I map them like this:
JavaSparkContext.textFile(...) -> SQLContext.read().textFile(...)
JavaRDD.filter(Function) -> Dataset.filter(FilterFunction)
JavaRDD.map(Function) -> Dataset.map(MapFunction)
JavaRDD.mapToPair(PairFunction) -> Dataset.groupByKey(MapFunction) ???
JavaPairRDD.aggregateByKey(U, Function2, Function2) -> KeyValueGroupedDataset.???
And the corresponing questions are:
Equals JavaRDD.mapToPair the Dataset.groupByKey method?
Does JavaPairRDD map to KeyValueGroupedDataset?
Which method equals the JavaPairRDD.aggregateByKey method?
However, I want to port the following RDD code into a Dataset one:
JavaRDD<Article> goodRdd = ...
JavaPairRDD<String, Article> ArticlePairRdd = goodRdd.mapToPair(new PairFunction<Article, String, Article>() { // Build PairRDD<<Date|Store|Transaction><Article>>
public Tuple2<String, Article> call(Article article) throws Exception {
String key = article.getKeyDate() + "|" + article.getKeyStore() + "|" + article.getKeyTransaction() + "|" + article.getCounter();
return new Tuple2<String, Article>(key, article);
}
});
JavaPairRDD<String, String> transactionRdd = ArticlePairRdd.aggregateByKey("", // Aggregate distributed data -> PairRDD<String, String>
new Function2<String, Article, String>() {
public String call(String oldString, Article newArticle) throws Exception {
String articleString = newArticle.getOwg() + "_" + newArticle.getTextOwg(); // <<Date|Store|Transaction><owg_textOwg###owg_textOwg>>
return oldString + "###" + articleString;
}
},
new Function2<String, String, String>() {
public String call(String a, String b) throws Exception {
String c = a.concat(b);
...
return c;
}
}
);
My code looks this yet:
Dataset<Article> goodDS = ...
KeyValueGroupedDataset<String, Article> ArticlePairDS = goodDS.groupByKey(new MapFunction<Article, String>() {
public String call(Article article) throws Exception {
String key = article.getKeyDate() + "|" + article.getKeyStore() + "|" + article.getKeyTransaction() + "|" + article.getCounter();
return key;
}
}, Encoders.STRING());
// here I need something similar to aggregateByKey! Not reduceByKey as I need to return another data type (String) than I have before (Article)

Related

Ef core could not translate my custom sql server STRING_AGG function

String.Join in efcore not support and I want to get list of string with separator like sql function String_Agg
I tried to create custom sql server function but i get this error:
The parameter 'columnPartArg' for the DbFunction 'QueryHelper.StringAgg(System.Collections.Generic.IEnumerable`1[[System.String, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=]],System.String)' has an invalid type 'IEnumerable'. Ensure the parameter type can be mapped by the current provider.
This is my function and OnModelCreatingAddStringAgg for register it in my dbcontext
public static string StringAgg(IEnumerable<string> columnPartArg, [NotParameterized] string separator)
{
throw new NotSupportedException();
}
public static void OnModelCreatingAddStringAgg(ModelBuilder modelBuilder)
{
var StringAggFuction = typeof(QueryHelper).GetRuntimeMethod(nameof(QueryHelper.StringAgg), new[] { typeof(IEnumerable<string>), typeof(string) });
var stringTypeMapping = new StringTypeMapping("NVARCHAR(MAX)");
modelBuilder
.HasDbFunction(StringAggFuction)
.HasTranslation(args => new SqlFunctionExpression("STRING_AGG",
new[]
{
new SqlFragmentExpression((args.ToArray()[0] as SqlConstantExpression).Value.ToString()),
args.ToArray()[1]
}
, nullable: true, argumentsPropagateNullability: new[] { false, false }, StringAggFuction.ReturnType, stringTypeMapping));
}
and this code run above function
_context.PersonnelProjectTimeSheets.GroupBy(c => new { c.Date.Date, c.PersonnelId, c.Personnel.PersonnelCode, c.Personnel.FirstName, c.Personnel.LastName})
.Select(c => new PersonnelProjectTimeOutputViewModel
{
IsConfirmed = c.Min(c => (int)(object)(c.IsConfirmed ?? false)) == 1,
PersonnelDisplay = c.Key.PersonnelCode + " - " + c.Key.FirstName + " " + c.Key.LastName,
PersonnelId = c.Key.PersonnelId,
Date = c.Key.Date,
ProjectName = QueryHelper.StringAgg(c.Select(x=>x.Project.Name), ", "),
TotalWorkTime = 0,
WorkTimeInMinutes = c.Sum(c => c.WorkTimeInMinutes),
});
And also i change my StringAgg method input to
string columnPartArg
and change SqlFunctionExpression of OnModelCreatingAddStringAgg to
new[]
{
new SqlFragmentExpression((args.ToArray()[0] as
SqlConstantExpression).Value.ToString()),
args.ToArray()[1]
}
and change my query code to
ProjectName = QueryHelper.StringAgg("Project.Name", ", ")
now when run my query, sql server could not recognize the Project
i guess the parameter 'columnPartArg' of dbfunction 'STRING_AGG' is varchar or nvarchar. right?
most database function or procedure has not table value as parameter.
in this case,use EFCore's 'client evaluation' is good sulution. linq like below:
_context.PersonnelProjectTimeSheets.GroupBy(c => new { c.Date.Date, c.PersonnelId, c.Personnel.PersonnelCode, c.Personnel.FirstName, c.Personnel.LastName})
.Select(c => new PersonnelProjectTimeOutputViewModel
{
IsConfirmed = c.Min(c => (int)(object)(c.IsConfirmed ?? false)) == 1,
PersonnelDisplay = c.Key.PersonnelCode + " - " + c.Key.FirstName + " " + c.Key.LastName,
PersonnelId = c.Key.PersonnelId,
Date = c.Key.Date,
ProjectName = string.Join(", ",c.Select(x=>x.Project.Name)),//Client evaluation
TotalWorkTime = 0,
WorkTimeInMinutes = c.Sum(c => c.WorkTimeInMinutes),
});

Salesforce Code Coverage Failure. Your code coverage is 12%. You need at least 75% coverage to complete this deployment

I wanted to deploy my code to production. In this apex code, I am calling a third party api for opportunity on click of button which triggers the doSomething() from VF page. I want to fix this issue and push the below code to my production account.
Here is my apex class code
{
private ApexPages.StandardController standardController;
public DetailButtonController(ApexPages.StandardController standardController)
{
this.standardController = standardController;
}
public PageReference doSomething()
{
// Apex code for handling record from a Detail page goes here
Id recordId = standardController.getId();
Opportunity record = (Opportunity) standardController.getRecord();
HttpRequest req = new HttpRequest();
HttpResponse res = new HttpResponse();
Http http = new Http();
req.setEndpoint('https://mergeasy.com/merge_file');
req.setMethod('POST');
//function to Convert date to mm/dd/yyy
Date dToday = record.Closing_Date__c;
String clos_date = 'On or before ' + dToday.month() + '/' + dToday.day() + '/' + dToday.year();
Date dAcc = record.Offer_Acceptance_Date__c;
String acc_date = dAcc.month() + '/' + dAcc.day() + '/' + dAcc.year();
String str1 = '' + record.Purchase_Price__c ;
String f_p_price = str1.SubStringBefore('.');
String str2 = '' + record.Escrow_Deposit__c ;
String e_d_price = str2.SubStringBefore('.');
String str3 = '' + record.Balance__c ;
String b_price = str3.SubStringBefore('.');
if(record.Second_Seller_Name_Phone__c==null && record.Second_Seller_Email__c==null && record.Name!=null && record.Company_Profile__c!=null){
req.setBody('seller_name='+record.Name+'&buyer_name='+record.Company_Profile__c+'&county='+record.County_Contract__c+'&street_address='+record.Left_Main__Address_1__c+'&p_price='+f_p_price+'&escrow_deposit='+e_d_price+'&title_agent='+record.Escrow_Agent_Name__c+'&title_address='+record.Escrow_Address__c+'&title_phone='+record.Escrow_Number__c+'&balance='+b_price+'&accept_date='+acc_date+'&closing_date='+clos_date+'&inspection_days='+record.Inspection_Days__c+'&special_clause='+record.Special_Clauses__c+'&doc_id=XXXXXXXXXX&doc_name=Contract.pdf&delivery_method=docusign&sign_order=true&recipient1_email='+record.Email__c+'&recipient1_name='+record.Name+'&recipient2_name='+record.Company_Profile__c+'&recipient2_email=developer.c2c#gmail.com&docusign_doc_name=Contract - Attorney Involved&email_subject=Contract:'+record.Left_Main__Address_1__c+'&email_body=Hi please sign the attached contract');
}
else if(record.Second_Seller_Name_Phone__c!=null && record.Second_Seller_Email__c!=null && record.Name!=null && record.Company_Profile__c!=null){
String name = record.Name + ' and ' + record.Second_Seller_Name_Phone__c ;
req.setBody('seller_name='+name+'&buyer_ame='+record.Company_Profile__c+'&county='+record.County_Contract__c+'&street_address='+record.Left_Main__Address_1__c+'&p_price='+f_p_price+'&escrow_deposit='+e_d_price+'&title_agent='+record.Escrow_Agent_Name__c+'&title_address='+record.Escrow_Address__c+'&title_phone='+record.Escrow_Number__c+'&balance='+b_price+'&accept_date='+acc_date+'&closing_date='+clos_date+'&inspection_days='+record.Inspection_Days__c+'&special_clause='+record.Special_Clauses__c+'&doc_id=XXXXXXXXXX&doc_name=Contract.pdf&delivery_method=docusign&sign_order=true&recipient1_email='+record.Email__c+'&recipient1_name='+record.Name+'&recipient2_name='+record.Second_Seller_Name_Phone__c+'&recipient2_email='+record.Second_Seller_Email__c+'&recipient3_email=developer.c2c#gmail.com&recipient3_name='+record.Company_Profile__c+'&docusign_doc_name=Contract - Normal(1S1B).pdf&email_subject=Contract:'+record.Left_Main__Address_1__c+'&email_body=Hi please sign the attached contract');
}
req.setHeader('Authorization', 'Bearer XXXXXXXXXXXXXX');
try {
res = http.send(req);
} catch(System.CalloutException e) {
System.debug('Callout error: '+ e);
System.debug(res.toString());
}
return null;
}
}
Here is the test class, which is showing 90% code coverage.
//testClasst.apxc
#isTest
public class testClassBt {
#isTest
static void testPostCallout() {
System.Test.setMock(HttpCalloutMock.class, new TestClass());
Opportunity opp = new Opportunity();
opp.Name='Rickson Developer';
opp.StageName='Underwrite';
opp.CloseDate= date.newInstance(1991, 2, 21);
opp.Closing_Date__c= date.newInstance(1991, 2, 21);
opp.Offer_Acceptance_Date__c =date.newInstance(1991, 2, 21);
opp.Purchase_Price__c = 1200.00;
opp.Escrow_Deposit__c= 1200.00;
opp.Company_Profile__c='RFTA Properties, LLC';
opp.County_Contract__c='Orange';
opp.Left_Main__Address_1__c='123 Main Street';
opp.Escrow_Agent_Name__c='Test Agent';
opp.Escrow_Address__c='123 Main street';
opp.Escrow_Number__c='9892132382';
opp.Inspection_Days__c=34;
opp.Special_Clauses__c='Test';
insert opp;
ApexPages.StandardController standardController = new ApexPages.StandardController(opp);
DetailButtonController strResp = new DetailButtonController(standardController);
strResp.doSomething();
}
}
//TestClass.apxc
#isTest
global class TestClass implements HttpCalloutMock {
global HTTPResponse respond(HTTPRequest request) {
HttpResponse response = new HttpResponse();
response.setHeader('Content-Type', 'application/json');
response.setBody('{"animal": {"id":1, "name":"Tiger"}}');
response.setStatusCode(200);
return response;
}
}
assuming that during the validation process you run just the test methods of this class, did you try to run your test class in Sandbox first?
Some IDE and the Salesforce Developer Console itself show you the covered lines after the unit test execution.
Just follow the green lines to debug the code and understand where the exception has been thrown.
If you could post the Test class too, we can help you more.

Re-index items in DSpace 6.2 after updating through REST

We are trying to build an application to provide bulk editing of item metadata ingested in DSpace, using the REST API. The update operations are being reflected in the DSpace UI. However the metadata remains unchanged in Solr, unless we run index-discovery. Since we intend to work with a large amount of data, running index-discovery everytime a metadata is edited, would be expensive. Could someone suggest a workaround/solution for this?
You could trigger an item update in the Java class of the REST endpoint.
For example:
In method addItemMetadata of java class org.dspace.rest.ItemsResource which represents the /items REST endpoint you could add the following line after the item metadata has been changed:
itemService.update(context, dspaceItem);
This line of code triggers an index update for that specific item.
This is what the complete addItemMetadata method will look like after the above change:
#POST
#Path("/{item_id}/metadata")
#Consumes({ MediaType.APPLICATION_JSON, MediaType.APPLICATION_XML })
public Response addItemMetadata(#PathParam("item_id") String itemId, List<org.dspace.rest.common.MetadataEntry> metadata,
#QueryParam("userIP") String user_ip, #QueryParam("userAgent") String user_agent,
#QueryParam("xforwardedfor") String xforwardedfor, #Context HttpHeaders headers, #Context HttpServletRequest request)
throws WebApplicationException
{
log.info("Adding metadata to item(id=" + itemId + ").");
org.dspace.core.Context context = null;
try
{
context = createContext();
org.dspace.content.Item dspaceItem = findItem(context, itemId, org.dspace.core.Constants.WRITE);
writeStats(dspaceItem, UsageEvent.Action.UPDATE, user_ip, user_agent, xforwardedfor, headers, request, context);
for (MetadataEntry entry : metadata)
{
// TODO Test with Java split
String data[] = mySplit(entry.getKey()); // Done by my split, because of java split was not function.
if ((data.length >= 2) && (data.length <= 3))
{
itemService.addMetadata(context, dspaceItem, data[0], data[1], data[2], entry.getLanguage(), entry.getValue());
}
}
itemService.update(context, dspaceItem);
context.complete();
}
catch (SQLException e)
{
processException("Could not write metadata to item(id=" + itemId + "), SQLException. Message: " + e, context);
}
catch (ContextException e)
{
processException("Could not write metadata to item(id=" + itemId + "), ContextException. Message: " + e.getMessage(),
context);
} catch (AuthorizeException e) {
processException("Could not update item(id=" + itemId + "), AuthorizeException. Message: " + e.getMessage(),
context);
} finally
{
processFinally(context);
}
log.info("Metadata to item(id=" + itemId + ") were successfully added.");
return Response.status(Status.OK).build();
}

How to access Spans with a SpanNearQuery in solr 6.3

I am trying to build a query parser by ranking the passages containing the terms.
I understand that I need to use SpanNearQuery, but I can't find a way to access Spans even after going through the documentation. The method I got returns null.
I have read https://lucidworks.com/blog/2009/07/18/the-spanquery/ which explains in a good way about the query. This explains how to access spans, but it is for solr 4.0 and unfortunately solr 6.3 doesn't have atomic reader any more.
How can I get the actual spans?
public void process(ResponseBuilder rb) throws IOException {
SolrParams params = rb.req.getParams();
log.warn("in Process");
if (!params.getBool(COMPONENT_NAME, false)) {
return;
}
Query origQuery = rb.getQuery();
// TODO: longer term, we don't have to be a span query, we could re-analyze the document
if (origQuery != null) {
if (origQuery instanceof SpanNearQuery == false) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,
"Illegal query type. The incoming query must be a Lucene SpanNearQuery and it was a " + origQuery.getClass().getName());
}
SpanNearQuery sQuery = (SpanNearQuery) origQuery;
SolrIndexSearcher searcher = rb.req.getSearcher();
IndexReader reader = searcher.getIndexReader();
log.warn("before leaf reader context");
List<LeafReaderContext> ctxs = (List<LeafReaderContext>) reader.leaves();
log.warn("after leaf reader context");
LeafReaderContext ctx = ctxs.get(0);
SpanWeight spanWeight = sQuery.createWeight(searcher, true);
Spans spans = spanWeight.getSpans(ctx, SpanWeight.Postings.POSITIONS);
AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);
Map<Term, TermContext> termContexts = new HashMap<Term, TermContext>();
Spans spans = fleeceQ.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);
// SpanWeight.Postings[] postings= SpanWeight.Postings.values();
// Spans spans = sQuery.getSpans();
// Assumes the query is a SpanQuery
// Build up the query term weight map and the bi-gram
Map<String, Float> termWeights = new HashMap<String, Float>();
Map<String, Float> bigramWeights = new HashMap<String, Float>();
createWeights(params.get(CommonParams.Q), sQuery, termWeights, bigramWeights, reader);
float adjWeight = params.getFloat(ADJACENT_WEIGHT, DEFAULT_ADJACENT_WEIGHT);
float secondAdjWeight = params.getFloat(SECOND_ADJ_WEIGHT, DEFAULT_SECOND_ADJACENT_WEIGHT);
float bigramWeight = params.getFloat(BIGRAM_WEIGHT, DEFAULT_BIGRAM_WEIGHT);
// get the passages
int primaryWindowSize = params.getInt(OWLParams.PRIMARY_WINDOW_SIZE, DEFAULT_PRIMARY_WINDOW_SIZE);
int adjacentWindowSize = params.getInt(OWLParams.ADJACENT_WINDOW_SIZE, DEFAULT_ADJACENT_WINDOW_SIZE);
int secondaryWindowSize = params.getInt(OWLParams.SECONDARY_WINDOW_SIZE, DEFAULT_SECONDARY_WINDOW_SIZE);
WindowBuildingTVM tvm = new WindowBuildingTVM(primaryWindowSize, adjacentWindowSize, secondaryWindowSize);
PassagePriorityQueue rankedPassages = new PassagePriorityQueue();
// intersect w/ doclist
DocList docList = rb.getResults().docList;
log.warn("Before Spans");
while (spans.nextDoc() != Spans.NO_MORE_DOCS) {
// build up the window
log.warn("Iterating through spans");
if (docList.exists(spans.docID())) {
tvm.spanStart = spans.startPosition();
tvm.spanEnd = spans.endPosition();
// tvm.terms
Terms terms = reader.getTermVector(spans.docID(), sQuery.getField());
tvm.map(terms, spans);
// The entries map contains the window, do some ranking of it
if (tvm.passage.terms.isEmpty() == false) {
log.debug("Candidate: Doc: {} Start: {} End: {} ", new Object[] { spans.docID(), spans.startPosition(), spans.endPosition() });
}
tvm.passage.lDocId = spans.docID();
tvm.passage.field = sQuery.getField();
// score this window
try {
addPassage(tvm.passage, rankedPassages, termWeights, bigramWeights, adjWeight, secondAdjWeight, bigramWeight);
} catch (CloneNotSupportedException e) {
throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Internal error cloning Passage", e);
}
// clear out the entries for the next round
tvm.passage.clear();
}
}
}
}

The given key was not present in the dictionary solrnet

Please note: I know for the question SolrNet - The given key was not present in the dictionary and I have initialized solr object just like Mauricio suggests.
I am using solr 4.6.0 and solrnet build #173, .net framework 4.0 and VS2012 for development. For some unknown reason I am receiving error 'The given key was not present in the dictionary'. I have a document with that id in solr, I've checked via browser. It's a document like any other document. Why is error popping up? My code (I've made a comment on the place where the error happens):
//establishes connection with solr
private void ConnectToSolr()
{
try
{
if (_solr != null) return;
Startup.Init<Register>(SolrAddress);
_solr = ServiceLocator.Current.GetInstance<ISolrOperations<Register>>();
}
catch (Exception ex)
{
throw new Exception(ex.Message);
}
}
//Returns snippets from solr as BindingSource
public BindingSource GetSnippets(string searchTerm, DateTime? startDate = null, DateTime? endDate = null)
{
ConnectToSolr();
string dateQuery = startDate == null
? ""
: endDate == null
? "savedate:\"" + convertDateToSolrFormat(startDate) + "\"" //only start date
: "savedate:[" + convertDateToSolrFormat(startDate) + " TO " +
convertDateToSolrFormat(endDate) + "]";//range between start and end date
string textQuery = string.IsNullOrEmpty(searchTerm) ? "text:*" : "text:*" + searchTerm + "*";
List<Register> list = new List<Register>();
SolrQueryResults<Register> results;
string currentId = "";
try
{
results = _solr.Query(textQuery,
new QueryOptions
{
Highlight = new HighlightingParameters
{
Fields = new[] { "*" },
},
ExtraParams = new Dictionary<string, string>
{
{"fq", dateQuery},
{"sort", "savedate desc"}
}
});
for (int i = 0; i < results.Highlights.Count; i++)
{
currentId = results[i].Id;
var h = results.Highlights[currentId];
if (h.Snippets.Count > 0)
{
list.Add(new Register//here the error "the given key was not present in the dictionary pops up
{
Id = currentId,
ContentLiteral = h.Snippets["content"].ToArray()[0].Trim(new[]{' ', '\n'}),
SaveDateLiteral = results[i].SaveDate.ToShortDateString()
});
}
}
BindingList<Register> bindingList = new BindingList<Register>(list);
BindingSource bindingSource = new BindingSource();
bindingSource.DataSource = bindingList;
return bindingSource;
}
catch(Exception e)
{
MessageBox.Show(string.Format("{0}\nId:{1}", e.Message, currentId), "Solr error");
return null;
}
}
I've found out what's causing the problem: saving empty documents into solr. If I make an empty query (with text:*) through solrnet (usually I do this if I want to see all saved documents) and empty document is one of saved docs, then 'The given key is not present in dictionary pops up'. If all of the documents have text in them, this error doesn't pop up.
If you document contains fields with types other than string and you index null value to a double or integer field you will get the same error.
solr query return the null field as:
<null name="fieldname"/>
should be
<double name="fieldname">0.0</double>
or
<double name="fieldname"/>

Resources