...
Name | Date | Score |
---|
Joe Jones | 1/2/03 | 88 |
joe jones | 1/2/03 | 88 |
Jane Jackson | 2/3/04 | 77 |
Jane Jackson | February 3, 2004 | 77 |
Jill Johns | 3/4/05 | 66 |
Jill Johns | 3/4/05 | 66.00 |
TransformTransformation:
...
trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | deduplicate |
---|
SearchTerm | Remove duplicate rows |
---|
|
If you use deduplicate
on remove duplicate rows on this dataset, no rows are previewed. This preview indicates that no rows will be removed as duplicates. You might need to clean up the data before you can remove any duplicate rows.
Your first step should be get your capitalization consistent. Try the following:
...
trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | set col:Name value: |
---|
|
...
proper(Name) | p01Name | Columns |
---|
p01Value | Name |
---|
p02Name | Formula |
---|
p02Value | proper(Name) |
---|
SearchTerm | Edit column with formula |
---|
|
All entries in the Name
column now appear as proper names. Next, you can clean up the score column by normalizing numeric values to the same format. Try the following:
...
trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | set col:Score value: |
---|
|
...
numformat(Score, '##.00') | p01Name | Columns |
---|
p01Value | Score |
---|
p02Name | Formula |
---|
p02Value | numformat(Score, '##.00') |
---|
SearchTerm | Edit column with formula |
---|
|
The above transform transformation normalizes the numeric formats to include two-digits after the decimal point always, which forces all numbers to be the same format. You can use the ##
format string here, too.
Use the following to fix the Date column:
...
trans |
---|
RawWrangle | true |
---|
p03Value | '2/3/04' |
---|
Type | step |
---|
WrangleText | replace col:Date with:'2/3/04' on:'February 3, 2004' |
---|
p01Name | Column |
---|
p01Value | Date |
---|
p02Name | Find |
---|
p02Value | 'February 3, 2004' |
---|
p03Name | Replace with |
---|
SearchTerm | Replace text or pattern |
---|
|
Now, you can execute the deduplicate transformyour dataset:
...
trans |
---|
RawWrangle | true |
---|
Type | step |
---|
WrangleText | deduplicate |
---|
SearchTerm | Remove duplicate rows |
---|
|
Results:
Name | Date | Score |
---|
Joe Jones | 1/2/03 | 88.00 |
Jane Jackson | 2/3/04 | 77.00 |
Jill Johns | 3/4/05 | 66.00 |
...