Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Published by Scroll Versions from space DEV and version next

D toc

Overview

Excerpt

In 

D s product
rtrue
visual profiling provides real-time interactive visualizations of your dataset to assist in the discovery, cleansing, and transformation of your data. Visual representations are required for interpreting large volumes of data, and the platform's innovative profiling techniques visualize key statistical information in a dynamic, easy-to-consume format for faster transformation. 

...

Visual profiles are available while you transform your data in the Transformer page, when you dig into the detail of individual columns, and after you execute your job at scale. Each of these interfaces has different usage patterns designed to accelerate and simplify data transformation for that specific area of the process.

Uses

  • Locate anomalies. Visual profiling surfaces missing or invalid data in individual columns. These values can then be selected and transformed as needed.  

  • Identify distributions. In the data grid, you can review value distribution for each column in your dataset. When exploring the column details, you can also identify and select statistical outliers among your column data.

  • Identify areas for further refinement. After a job has completed, you can review its visual profile through the application and then take action on problematic data. 

...

In the Column Details panel, you can review profiling of patterns detected in the values for the selected column. These patterns can be selected, which identifies the relevant values in the column that match the pattern. You can then use these selections as the basis for building transforms that apply to the matching values.For more information, see Column Details Panel.

Job Details

After the application has successfully executed a job for which profiling is enabled, you can explore a visualization of the generated dataset in the Job Details page. You can download your visual profile and results of your data quality rules on the entire dataset in PDF and JSON format.

...

For more information on job details, see Job Details Page.

Enable

Visual profiling is enabled on a per-job basis. See Run Job Page.

Profiling Engine

Decoupled from the user interface, the profiling engine performs the calculations required to power the visualizations before job execution and after the job results have been generated.

...

For profiling jobs, the Spark running environment is used for Spark transformation jobs.


Metric TypeMeasurement
Frequency (top-k)Approximate
Numerical histogramsApproximate
Simple statistics (mean, stdev, min, max)Exact
QuartilesApproximate

...

Snowflake

For jobs executed in Snowflake, profiling jobs may also be executed in Snowflake.

Info

NOTE: The option to pushdown profiling to Snowflake is selected for individual flows and is only applied if the job successfully executes on Snowflake. Additional limitations may apply. For more information, see Flow Optimization Settings Dialog.

Info

NOTE: In Snowflake, calculations of quartiles uses a different algorithm than the same calculations in Spark. Some differences in values should be expected.

Metric TypeMeasurement
Frequency (top-k)Approximate
Numerical histogramsApproximate
Simple statistics (mean, stdev, min, max)Exact
QuartilesApproximate

D s also
inCQLtrue
label((label = "visual_profile") OR (label = "profiling"))