It will only give (A,B). I am generating the pair from combinations of the
the strings A,B,C and D, so the pairs (ignoring order) would be
(A,B),(A,C),(A,D),(B,C),(B,D),(C,D)
On successful filtering using the original condition it will transform to
(A,B) and (C,D)
> What would it do with the following dataset?
> (A, B)
> (A, C)
> (B, D)
>> Hi,
>> I have a RDD of pairs of strings like below :
>> (A,B)
>> (B,C)
>> (C,D)
>> (A,D)
>> (E,F)
>> (B,F)
>> I need to transform/filter this into a RDD of pairs that does not repeat
>> a string once it has been used once. So something like ,
>>
>> (A,B)
>> (C,D)
>> (E,F)
>> (B,C) is out because B has already ben used in (A,B), (A,D) is out
>> because A (and D) has been used etc.
>> I was thinking of a option of using a shared variable to keep track of
>> what has already been used but that may only work for a single partition
>> and would not scale for larger dataset.
>>
>> Is there any other efficient way to accomplish this ?
