You might want to execute a profile of the data that you imported from the source. As soon as you create a recipe from a source, you can execute a job to profile the dataset.
By profiling the data as soon as you load it into the Transformer page, you can assess the following:
- Identify problems in the source and potentially correct them in the source system.
- Create a baseline to evaluate the data wrangling work you do in
D s product r true
- Identify mismatched or missing values.
Tip: You can also use this technique to generate an output of your source data, which is useful if you do not have read access to the source outside of
- Create In the Import Data page, create an imported dataset from your source. Add it to a flow. See Import Data Page.Depending on how your data is structured, you may choose to disable Detect Structure. For more information, see Initial Parsing Steps.
- In Flow viewView, create a recipe for your imported dataset. See Flow View Page.
- In Flow viewView, edit the newly created recipe. It is opened in the Transformer page. See Transformer Page.
- If needed, add a header step to your dataset.
- Click Run.
- In the Run Job page, select the following options:
If you have the option of selecting a running environment, select the default one. This option may not be available in your product.
- CSV format (you need at least one format to generate your dataset's profile).
- Select to profile results.
- Click Run.
When the results are generated, click View Resultsclick the Profile tab in the Job Details page.
- A profile of your dataset is displayed.
- Missing or mismatched values in each column
- Statistical break-out by quartile
- Beginning dataset size and baseline job execution speed
Tip: You might want to write down the overall statistics for the dataset, which may be useful when validating the changes you have applied through recipe.
can download the profile and output for review.
For more information, see Job Details Page.
Preserve Source Visual Profile
- In Flow View, select the recipe that was used to create the source profile.
- Rename this recipe to something like,
- Create an output off of this recipe. Run the job if you have not yet created to create the visual profile.
- Select the recipe again. Now, click Add New Recipe.
- Edit this new recipe and build out your transformation steps.
Whenever you need to regenerate the profile for the source, select the
SourceDatarecipe and select the output from it. Then, run a job for it.
Tip: This technique is useful if you are replacing the source dataset with refreshed data on a periodic basis.
|D s also|