Page tree

 

Support | BlogContact Us | 844.332.2821

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version r0411

...

In

D s product
rtrue
visual profiling provides real-time interactive visualizations of your dataset to assist in the discovery, cleansing, and transformation of your data. Visual representations are required for interpreting large volumes of data, and the platform's innovative profiling techniques visualize key statistical information in a dynamic, easy-to-consume format for faster transformation. 

  • At the individual column level, visual profiles provide interactive statistical information visualized in an appropriate manner for the data type. For example, columns of Zip Code data type can be represented on a geographical map of the United States.
  • All visual profiles are interactive, so you can dig into the details of the data. Select one or more elements in a profile, and you can take immediate action on the data, either through steps you define or through transform recommendations provided by the platform.
  • The Transformer page displays a set of recommended actions to take based on the values, rows, or columns that you select in the data grid. These recommendations are motivated by platform logic and prior usage information. For more information, see Overview of Predictive Transformation.

Visual profiles are available while you transform your data in the Transformer page, when you dig into the detail of individual columns, and after you execute your job at scale. Each of these interfaces has different usage patterns designed to accelerate and simplify data transformation for that specific area of the process.

Uses

  • Locate anomalies. Visual profiling surfaces missing or invalid data in individual columns. These values can then be selected and transformed as needed.  

  • Identify distributions. In the data grid, you can review value distribution for each column in your dataset. When exploring the column details, you you can also identify and select statistical outliers among your column data. 

  • Correlate columns. In the column details screen, you can explore the correlations between values in one column and another. 
  • Identify areas for further refinement. After a job has completed, you can review its visual profile through the application and then take action on problematic data. 

A Note about Metrics

  • In the data grid: Counts and other computations in column histograms and column details are exact measurements against the currently loaded current sample. 
  • In job results: Counts and other computations in the visual profile are approximations against the full dataset.

    Info

    NOTE: The computational cost of generating exact visual profiling measurements on large datasets in interactive visual profiles severely impacts performance. As a result, visual profiles across an entire dataset represent statistically significant approximations.

...

In the following example, a dataset containing address information has been loaded in the Transformer page: 

D caption
Example dataset

 

In this example, we are interested in exploring geographic information. From the column drop-down for the Zip column, you select Column Details.

Tip

Explore detail on demand. Generate visual profiles from the column drop-down. If you need to profile across multiple columns, use the Column Browser available through the left navigation panel.

When you explore the column details of the new column, you can see the following representation of the data:

D caption
Zip Code data type represented as a U.S. map

In this case, the values in your Zip column are recognized as being of Zipcode data type. The application then represents these values as a U.S. map, which quickly renders numeric data into a format that's much easier to read and analyze.

Tip

Type-specific visualizations. The profile of the column values is represented in a type-specific visualization to assist in rapid analyzing and taking action on some or all values in the column.

To explore further, you can use this interactive visualization to identify how values in other columns correspond to values in the Zip column. In the Column Browser on the left side of the screen, you can select one or more values in other columns to see the corresponding Zip values. Below, the value Y has been selected from the Baked Goods column, illustrating the geographic distribution of markets where baked goods are sold:

Tip

Query brushing and linking. Select one or more data elements in the Column Browser, and the application automatically queries the platform for the corresponding values in the profiled column and a set of meaningful actions to take on the selected data.

 

Image Removed

D caption
Specific values in a column correlated with values in profiled column

Visual Profiling Interfaces

Wherever you can interact with data the 

D s webapp
, visual profiling simplifies the process. 

...

For additional details on visual transformation, see  see Transform Basics.

Column Details

Through the Transformer page, you can explore statistical details about individual columns, visually represented based on the column's data type. From the drop-down for any column, select Column Details.

In this interface, you can review the range of values in the column and can optionally select one or more values from other columns to see which values in the current column apply. The visualizations for a column depend on the data type. 

See Column Details Panel.

Generate Results

After results have been successfully generated and profiling has been enabled, you can explore a visualization of the generated results in the Results Summary page. See Results Summary Page.

...

  • In the Transformer page, the profile engine is called for incremental changes is called changes whenever a step is added to your recipe, so that you can see immediate updates to the visual profile for each column. It utilizes separate algorithms for generating the data quality bars, column histograms, value counts, frequency distributions, and other relevant statistics. When you dig into the column details, the visual profile is up-to-date and can be updated again based on your selections in that interface.
  • During job execution, it is queried as a separate job when profiling is executed across the entire dataset.
Info

NOTE: When you choose to profile your results, you are creating two distinct tasks: 1) run your transform recipe against your source and 2) profile the results. Due to the computational complexity of generating the interactive results, a profiling task often takes longer to complete than a transformation task and is therefore an optional element of a job run.