Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r097

D toc

Excerpt

The

D s webapp
rtrue
provides easy-to-use visual features to quickly iterate on transformations to clean your data. As needed, you can enhance your data with new columns, metadata information, or other datasets that you combine through joins and unions.

Find and Fix Errors

In the 

D s webapp
, it is very easy to identify where there are errors in your data. What is truly innovative is how you correct them:

  1. Identify missing or mismatched data by color-coded bars in column data. 
  2. Select a bar. 
  3. Suggestions are offered in a set of cards on the right panel.
  4. Click a suggestion, and immediately see the effects of the suggested transformation previewed in the data grid. 
    1. If the transformation needs tweaking, you can edit the transformation as needed.
    2. If the transformation is not the correct one, click another suggestion.
  5. When satisfied, you add the transformation, and your sample of data is transformed. 

D caption
Select errors in your data, and review AI-driven suggestions for how to correct. Make the change on the spot. 

Through this series of seeing, selecting, and refining issues in your sampled data, you can address basic errors in data mismatches, missing data, non-standard values, outlier values, and much more to improve the overall consistency and quality of your data. 

For more information, see Find and Fix Bad Data.

Add New Data

You can add in new data to your dataset through the following methods.

New columns

Create new columns in your dataset containing literal values, function outputs, or values from other columns, including extraction of values into new columns. 

D caption
Build a New Formula transformation to craft a new column of data containing custom functions or literal values.

For more information, see Create Column.

Metadata

You can insert references to metadata about your datasources within your dataset. Source row and path information can be added as new data. 

For more information, see Add Dataset Metadata

Join or Union Datasets

Combine in data from other sources using joins or unions.

  • A join combines two datasets based on values in one or more shared columns. For example, if both datasets contain a productId column, then the rows where the productId values match can be combined together.
  • A union appends two or more similar datasets together. Rows of the second dataset are added to the end of the first.

Into your recipe in development, you can apply these operations to:

  • other imported datasets
  • the outputs of recipes in the same flow
  • the outputs of recipes from other flows

For more information, see Combine Datasets.

Reshape Your Data

You can change the composition of rows and columns in your dataset through transformation.

D caption
Change the structure of your data through menu-driven selection of rows, columns, and formulas.

The following types of transformations can be used to reshape or completely replace the columns and rows in your dataset:

  • Split: Split a column based on one or more known delimiters or based on index positions in the data. See Split Column Data.
  • Aggregation: Perform computations across a set of grouped rows, generating the results in a new column or a reshaped table. See Create Aggregation Calculations.
  • Pivot: Create pivot tables based on one or more calculations and selected fields. See Create Pivots.
  • Select: Select a set of columns to completely replace the current dataset. See Select Columns.

Nest and Unnest

Data in separate columns can be combined together into single columns as arrays or objects (maps). Similarly, columns of these object types can be expanded as new columns or new rows in your dataset. For more information, see Nested Data Basics.