Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

D toc

Release 4.1.1

This release introduces scheduling of dataset execution from within your flows, as well as a number of bug fixes and system improvements.

What's New


Admin, Install, & Config:

Workspace:

  • Schedule executions of one or more wrangled datasets within a flow. See Flow View Page.

Transformer Page:

Changes to System Behavior

Single-file run_job action is deprecated for CLI

See Changes to the Command Line Interface.

 

Key Bug Fixes

TicketDescription
TD-25615Error in flat-aggregation generation in Spark running environment.
TD-25438Deleting an upstream reference node does not propagate results correctly to the Transformer page.
TD-15509

TDE files generated by the TDE download option may fail to open in Tableau if column names are more than 124 characters in length.

Info

NOTE: When you run the job, include a publishing option to publish to TDE format. When you export the generated results, this issue no longer appears in the output.

New Known Issues

TicketComponentDescription
TD-26041Workspace

When editing a schedule that was set for 0 minutes after the hour, the schedule is displayed to execute at 15 minutes after the hour.

Tip

Workaround: This bug is a display bug. The correct value is saved when the value is set to 0 for the schedule.

Release 4.1

This release of 

D s product
productee
 includes the ability to share flows and a completely revamped Transformer page for a simpler, faster, and more consistent user experience. From the Transformer page, you can now collect ad-hoc samples using a wider variety of techniques. New integration and publishing options make the 
D s platform
 broader in its reach throughout the enterprise. Read below for additional features and details.

What's New

Admin, Install, & Config:

Warning

Support for integration with MapR Hadoop clusters has been deprecated. The

D s platform
continues to support Cloudera and Hortonworks. For more information on other available options, please contact your
D s company
representative.

Info

NOTE: Support for CentOS 6.2.x and CentOS 6.3.x has been deprecated. Please upgrade to the latest CentOS 6.x release.

 

Import:

Workspace:

  • Improved user experience with flows. See Flow View Page.
  • Share a flow with one or more users, so you can collaborate on the same assets. See Flow View Page.

Transformer Page:

  • New navigation and layout for the Transformer page simplifies working with data and increases the area of the data grid. See Transformer Page.
  • Sampling improvements:
    • Enhanced sampling methods provide access to customizable, task-oriented subsets of your data. See Samples Panel.
    • Improved Transformer loading due to persistence of initial sample. 
    • For more information on the new sampling methods, see Overview of Sampling
  • Highlight the recipe steps where a specific column is referenced. See Column Menus

Compilation/Execution:

  • Publishing to Hive:
    • You can now publish directly to Hive as part of job execution. Just configure a new publishing action. See Run Job Page.
    • Enhanced publishing options for Hive target tables including Create, Append, Drop & Truncate.  See Export Results Window.
  • Photon jobs can be automatically killed based on configurable runtime and memory consumption thresholds. See Configure Photon Running Environment.
  • The Photon running environment now supports Parquet format.

Admin:

  • SSO integration with AD/LDAP now supports auto-registration for users visiting the 
    D s webapp
    . See Configure SSO for AD-LDAP.

Language:

Changes to System Behavior

Hadoop Pig running environment is no longer available

As of Release 4.1, the Pig running environment is no longer available for execution of jobs. Implications:

  • Deployments that are connected to a Hadoop cluster must use Spark for job execution. See Configure Spark Running Environment.
  • CLI scripts that reference running jobs on the pig running environment must be updated. See Changes to the Command Line Interface.
  • Integration with Cloudera Navigator is not supported in this release.
  • Integration with HDI/WASB is supported but may require further configuration. Please contact 
    D s support
    .

Python UDFs are no longer available

With the removal of the Hadoop Pig running environment, Python user-defined functions are no longer available. 

Info

NOTE: As of Release 4.1, all user-defined functions must be migrated to or created in Java. For more information, see Java UDFs.

For more information on migrating UDFs, see Changes to the User-Defined Functions.

Transform Editor has been removed

In Release 4.0.1 and earlier, you could type in 

D s lang
 transformation steps as plain text in the Transform Editor as well as use the Transform Builder. 

In Release 4.1 and later, the Transform Editor has been removed, in favor of an enhanced version of the Transform Builder.

Tip

Tip: You can copy and paste raw

D s lang
commands into the Transformation/Choose a transformation textbox of the Transform Builder. The documentation still displays example transformation steps as
D s lang
text commands.

See Transform Builder.

Dependencies Browser has been replaced

In Release 4.0.1, you could explore dependencies between your datasets through the Dependencies Browser, which was accessible through a graph in the toolbar in the Transformer page. 

In Release 4.1, this browser has been replaced by the Dataset Navigator. In the Transformer page, click the drop-down next to the name of the current dataset. In the Dataset Navigator, you can browse the datasets through a list or flow view to locate another wrangled dataset to load.

In Release 4.2 and later, this browser has been renamed to the Recipe Navigator. See Recipe Navigator

Manual database installation is no longer required

Prior to Release 4.0, the databases had to be installed manually. 

In Release 4.0 and later, the databases are installed for you on the local server as part of the basic install process. For more information, see Set up the Databases.

If you need to re-install the databases, manual steps are still available. See Install the Databases.

Head sample replaced by random sample on upgrade

In Release 4.0 and earlier, if your dataset used the initial rows (head) sample in the data grid, this sample is replaced by the random sample after the upgrade.

Tip

Tip: When the dataset is loaded in the Transformer page after upgrade, you can switch the sample back to the first rows sample. For more information, see Samples Panel.


Miscellaneous

  • The Send a Copy feature introduced in Release 4.0 has been integrated with the general sharing capabilities. See Share Flow Dialog.
  • Ah-hoc publishing to Redshift in CSV format is no longer supported. See Export Results Window.

Key Bug Fixes

TicketDescription
TD-23787When publishing location is unavailable, spinning wheel hangs indefinitely without any error message.
TD-22467Last active sample is not displayed during preview of multi-dataset operations.
TD-22128Cannot read multi-file Avro stream if data is greater than 500 KB.
TD-20796For date column, Spark profiling shows incorrect set of dates when source data has a single date in it.
TD-19865

You cannot configure a publishing location to be a directory that does not already exist.

See Run Job Page.

TD-17657

splitrows transform allows splitting even if required parameter on is set to an empty value.

New Known Issues

TicketComponentDescription
TD-25419Profiling

When a pivot transform is applied, some column histograms may not be updated.

Tip

Workaround: Refresh the page.

TD-25000Connectivity

Cannot publish to Cloudera Navigator due to 500 - Internal Server Error.

Tip

Workaround: The Cloudera Navigator integration is not supported in this release. If it has been enabled in your deployment in a prior release, it must be disabled. To disable, please set the following property value in platform configuration.

D s config

Code Block
"clouderaNavigator.enabled": false,
TD-24358Compilation/Execution

Circular reference in schema of Avro file causes job in Spark running environment to fail. See https://issues.apache.org/jira/browse/AVRO-1285.

TD-20882ConnectivitySpark jobs based on relational sources fail if one or more columns is dropped from the source table.
TD-21836Transformer Page

Values in a newly collected sample do not appear in sorted order, even though a sort transform had been previously applied.

Tip

Workaround: You can re-apply the sort transform to the new sample. Some limitations apply. For more information, see Sort Transform.