Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Data preparation (or data wrangling) has been a constant challenge for decades, and that challenge has only amplified as data volumes have exploded.

Why use
D s product


Company value: Be a multiplier.

Estimates vary, but something like 60% of an analyst's time is consumed with preparing data for use, leaving two days per week to actually analyze it. That's expensive and inefficient.


The scale and complexity of these transformations can quickly overwhelm even the most powerful of machines.

D s product
 utilizes a number of techniques to deliver high performance at scale.

Image Modified

D caption
Platform interactions and data movements


You can move results and flows between instances, as needed.


. Features may be released here before they are released in other products. Please visit

See Supported Deployment Scenarios for Cloudera.

See Supported Deployment Scenarios for Hortonworks.

See Supported Deployment Scenarios for AWS.

See Supported Deployment Scenarios for Azure.


When you finish your recipe, you run a job to generate results. A job executes your set of recipe steps on the source data, without modifying the source, for delivery to a specified output, which defines location, format, compression and other settings.



Datasets, recipes, and outputs can be grouped together into objects called flows. A flow is a unit of organization in the platform. Depending on your product, flows can be shared between users, scheduled for automated execution, and exported and imported into the platform. In this manner, you can build and test your recipes, chain together sets of datasets and recipes in a flow, share your work with others, and operationalize your production datasets for automated execution.


In addition to the above, the following key features simplify the data prep process and bring enterprise-grade tools for managing your production wrangling efforts.

Visual Profiling


For individual columns in your dataset, data histograms and data quality information immediately identify potential issues with the column. Select from these color-coded bars, and specific suggestions for transformations are surfaced for you. When you make a selection, you can optionally choose to display only the rows or columns affected by the change.