D toc |
---|
Overview
Excerpt | ||||
---|---|---|---|---|
In
|
...
Visual profiles are available while you transform your data in the Transformer page, when you dig into the detail of individual columns, and after you execute your job at scale. Each of these interfaces has different usage patterns designed to accelerate and simplify data transformation for that specific area of the process.
Uses
Locate anomalies. Visual profiling surfaces missing or invalid data in individual columns. These values can then be selected and transformed as needed.
Identify distributions. In the data grid, you can review value distribution for each column in your dataset. When exploring the column details, you can also identify and select statistical outliers among your column data.
- Identify areas for further refinement. After a job has completed, you can review its visual profile through the application and then take action on problematic data.
...
In the Column Details panel, you can review profiling of patterns detected in the values for the selected column. These patterns can be selected, which identifies the relevant values in the column that match the pattern. You can then use these selections as the basis for building transforms that apply to the matching values.For more information, see Column Details Panel.
Job Details
After the application has successfully executed a job for which profiling is enabled, you can explore a visualization of the generated dataset in the Job Details page. You can download your visual profile and results of your data quality rules on the entire dataset in PDF and JSON format.
...
For more information on job details, see Job Details Page.
Enable
Visual profiling is enabled on a per-job basis. See Run Job Page.
Profiling Engine
Decoupled from the user interface, the profiling engine performs the calculations required to power the visualizations before job execution and after the job results have been generated.
...
For profiling jobs, the Spark running environment is used for Spark transformation jobs.
Metric Type | Measurement |
---|---|
Frequency (top-k) | Approximate |
Numerical histograms | Approximate |
Simple statistics (mean, stdev, min, max) | Exact |
Quartiles | Approximate |
...
Snowflake
For jobs executed in Snowflake, profiling jobs may also be executed in Snowflake.
Info |
---|
NOTE: The option to pushdown profiling to Snowflake is selected for individual flows and is only applied if the job successfully executes on Snowflake. Additional limitations may apply. For more information, see Flow Optimization Settings Dialog. |
Info |
---|
NOTE: In Snowflake, calculations of quartiles uses a different algorithm than the same calculations in Spark. Some differences in values should be expected. |
Metric Type | Measurement |
---|---|
Frequency (top-k) | Approximate |
Numerical histograms | Approximate |
Simple statistics (mean, stdev, min, max) | Exact |
Quartiles | Approximate |
D s also | ||||
---|---|---|---|---|
|