apache spark - Scala - How to iterate over tuples on RDD?


Question: 

I have an RDD that contains tuples like this

(A, List(2,5,6,7))

(B, List(2,8,9,10))

and I would like to get the index of the first element where a specific condition between value and index holds. So far I have tried this on a single tuple test and it works fine:

test._2.zipWithIndex.indexWhere { case (v, i) => SOME_CONDITION}

I just can't find how to iterate over all tuples in the list.. I have tried:

val result= test._._2.zipWithIndex.indexWhere { case (v, i) => SOME_CONDITION}



1 Answer: 

First, "iterate" is the wrong concept here - it comes from the realm of imperative programming, where you actually iterate over the data structure yourself. Spark uses a functional paradigm, which let's you pass a function to handle each record in the RDD (using some higher-order function like map, foreach...).

In this case, sounds like you want to map each element into a new element.

To map only the right-hand side of your tuples (without changing the left-hand side), you can use mapValues:

// mapValues will map the "values" (of type List[Int]) to new values (of type Int)
rdd.mapValues(list => list.zipWithIndex.indexWhere { 
  case (v, i) => someCondition(v, i) 
})

Or, alternatively, using plain map:

rdd.map { 
  case (key, list) => (key, list.zipWithIndex.indexWhere { 
    case (v, i) => someCondition(v, i) 
  }) 
}
 

More Articles


django - R_user not defined , rpy2

I want to use R from django app .Now I am in a huge mess .--> I have installed rpy2 for that .---> I am able to run everything from Python IDE eg . import rpy2.rinterface as rinterfacerinterface.initr()or import rpy2.robjects as somethingAs I open Python from cmd or in django file .I am getting err

php - Is it possible to tell the names of the $_GET variables the page is waiting for?

Hello I have a question about PHP $_POST and $_GET.Let's say I have a web page that is expecting $_GET variables.Are the variable names anonymous?Let's say index.php has the script ofif( isset( $_GET['somevariale'] ) ){ rest of the code}Is it possible for anyone to find out the name of the $_GET va

android - the app falls when i select a user from a SQLITE list and pass the ID to another activity

i have a minor problem here, the app i created makes the user adds contacts details of his own and can call them again by pressing on them from the listview and thier data should appear on the new activity to edit or delete themthe problem is that the app crashes when i select a contact....where did


scikit learn - How to reuse pickled objects in python?

I have pickled some of the objects so that I can reuse them later. For example, I pickled three different gradient boosting regressors that I wanted reuse later. However, when I tried to use transform method for the regressor, python complained that it needs to be fitted first. Below is the code:mod

r - How to look back a few rows higher to see if a condition is satisfied and then replace a cell accordingly

I have a dataframe whereby if a condition is satisfied, it looks a couple rows above to find a specific row with another string and replaces itself with that entire cell which had the string.Here is an Example:If "test" is detected in the type column, it looks up to 3 rows above for a row which cont

Dose google play count insalls out of itself downloads?

I recently published my app in google play . The question is how google play show counts ?1 - All device that have google play installed and my apps too , total count of my apps install will show in play store .2 - Just all downloads that directly are downloaded from google play will be count .


c# - Regular expression to match a string that contains only numbers, not letters

My code is currently using the following Regex expression which matches on numbers:Regex numberExpression = new Regex(@"(?<Number>\d+)");This current works fine for input strings like "1", "100", "1a", "a1", etc....But I want to change it so it does NOT match when the input string contains a l

cookies - PHP Unique Computer ID

Is it possible for php(or javascript in the worst case) to create a unique id for a user that is not cookie or ip dependant. I have seen on myminicity.com that on each city the count only goes up once a day(it has a unique id for everyone i think) and even if I delete my cookies and refresh ip it st

javascript - jQuery .load: any way to load the same page when page is refreshed

So I have a website that loads pages to a container div:function goto(addr) { $("#content").load(addr);}and a link that executes it<a href="#" id="aboutus" onclick="goto('page/aboutus.php');">About us</a>My problem is that whenever the page is refreshed, the loaded content resets to t

specifications - Essential techniques for pinpointing missing requirements?

An initial draft of requirements specification has been completed and now it is time to take stock of requirements, review the specification. Part of this process is to make sure that there are no sizeable gaps in the specification. Needless to say that the gaps lead to highly inaccurate estimates,