Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

...

NameDateScore
Joe Jones1/2/0388
joe jones1/2/0388
Jane Jackson2/3/0477
Jane JacksonFebruary 3, 200477
Jill Johns3/4/0566
Jill Johns3/4/0566.00

TransformTransformation:

d-

...

trans
RawWrangletrue
Typestep
WrangleTextdeduplicate
SearchTermRemove duplicate rows

If you use deduplicate on remove duplicate rows on this dataset, no rows are previewed. This preview indicates that no rows will be removed as duplicates. You might need to clean up the data before you can remove any duplicate rows.

Your first step should be get your capitalization consistent. Try the following:

d-

...

trans
RawWrangletrue
Typestep
WrangleTextset col:Name value:

...

proper(Name)
p01NameColumns
p01ValueName
p02NameFormula
p02Valueproper(Name)
SearchTermEdit column with formula

All entries in the Name column now appear as proper names. Next, you can clean up the score column by normalizing numeric values to the same format. Try the following:

d-

...

trans
RawWrangletrue
Typestep
WrangleTextset col:Score value:

...

numformat(Score, '##.00')
p01NameColumns
p01ValueScore
p02NameFormula
p02Valuenumformat(Score, '##.00')
SearchTermEdit column with formula

The above transform transformation normalizes the numeric formats to include two-digits after the decimal point always, which forces all numbers to be the same format. You can use the ## format string here, too.

Use the following to fix the Date column:

d-

...

trans
RawWrangletrue
p03Value'2/3/04'
Typestep
WrangleTextreplace col:Date with:'2/3/04' on:'February 3, 2004'
p01NameColumn
p01ValueDate
p02NameFind
p02Value'February 3, 2004'
p03NameReplace with
SearchTermReplace text or pattern

Now, you can execute the deduplicate transformyour dataset:

d-

...

trans
RawWrangletrue
Typestep
WrangleTextdeduplicate
SearchTermRemove duplicate rows

Results:

NameDateScore
Joe Jones1/2/0388.00
Jane Jackson2/3/0477.00
Jill Johns3/4/0566.00

...