Skip to main content

Power BI

Under Review

"R" Don't remove duplicates

Vote (78) Share
Gregory J. Deckler's profile image

Gregory J. Deckler on 21 Apr 2016 00:52:48

Provide the option of not removing duplicates automatically when creating R visualizations or provide the ability to create R datasets using the same syntax as shown in the comments when creating an R visualization

Comments (14)
Gregory J. Deckler's profile image Profile Picture

6b7252b1 c05b-4ac3-a6eb-1dbb57a49733 on 05 Jul 2020 23:48:51

RE: "R" Don't remove duplicates

This seem to be under review for two and a half year now. Any update?

Gregory J. Deckler's profile image Profile Picture

5689f9b4 8909-4f58-968f-f3cb4073b61a on 05 Jul 2020 23:42:32

RE: "R" Don't remove duplicates

yes, it should not be there. The histogram I have created is not proper one.

Gregory J. Deckler's profile image Profile Picture

3cf0c01c ea0a-48b9-bbae-5c1e5b5fc709 on 05 Jul 2020 23:40:11

RE: "R" Don't remove duplicates

Please allow to keep duplicates. They are also part of historical data, need it in our forecasting in power BI

Gregory J. Deckler's profile image Profile Picture

eec25853 fb2b-40c8-9643-dd539e276943 on 05 Jul 2020 23:39:38

RE: "R" Don't remove duplicates

I strongly agree - I should be possible to turn off remove duplicates as an option.

Gregory J. Deckler's profile image Profile Picture

982e1235 59ed-4dfe-83aa-5ba393a989ea on 05 Jul 2020 23:32:18

RE: "R" Don't remove duplicates

Please do the same for Python!

Gregory J. Deckler's profile image Profile Picture

3d9af35a 6baf-4df6-b98d-fc399c54d142 on 05 Jul 2020 23:19:20

RE: "R" Don't remove duplicates

Microsoft always adds useless features that always are dumb. The very least, for this stupid feature, is to have it disable-possible.

Gregory J. Deckler's profile image Profile Picture

e863213a fa3a-4b39-a244-ec8140c8659f on 05 Jul 2020 23:14:24

RE: "R" Don't remove duplicates

The work around for this is the use of an Index column. However, this does not work if the data is coming from different tables!!! I would have to create a new table with a index column for that new table containing the variables of interest. This would defeat the purpose of using a rational data structure and other aspects.

Gregory J. Deckler's profile image Profile Picture

fd1e6699 f18f-4103-8218-8e1f07c92e82 on 05 Jul 2020 23:00:20

RE: "R" Don't remove duplicates

Why does it remove duplicates by default? When performing univariate qualitative analysis, I want to be able to drop in a single qualitative field. This means I WANT duplicates and having keys complicates the analysis meaninglessly.

That automatic removal should be made an option. I believe that it was added because of the limitation of R scripts in Power BI to 150k rows. The removal of duplicates by Power BI (in what I assume to be some sort of pre-processor directive like call judging by the invalid code syntax shows in the editor) probably helps mitigate that limitation in certain types of data sets. Unfortunately, without the option to turn off that "pre-processor" like call, an entire segment of potential analysis is complicated or even impossible (if the original data set has no key).

Gregory J. Deckler's profile image Profile Picture

03968912 3446-412c-8b96-6ed9fee3d665 on 05 Jul 2020 22:53:58

RE: "R" Don't remove duplicates

I have no idea why this feature isn't standard behaviour - it's trivial in R to remove duplicates from a dataset if that behaviour is desired

Gregory J. Deckler's profile image Profile Picture

10ee6323 10ac-4063-b313-966d5ae98aa8 on 05 Jul 2020 22:51:46

RE: "R" Don't remove duplicates

I want to use R to create a histogram. I add one column, and then it removes all the duplicates, which provides a completely inaccurate histogram. This is pretty stilly - and potentially problematic if someone uses this without noticing.

Yes, I can add extra columns, or create an ID column, but I don't want to. I don't want the program to remove duplicates, just because it sees fit. There are times when it isn't appropriate - and as the analyst I want that choice.

Also, I want to write the simplest code, and that should involve only one column for a histogram.