In the Library page, you can review your imported and reference datasets and any macros that you may have created. You can also import new data from this page.
NOTE: You can only see the imported datasets to which you have access in your currently selected project or workspace. If the data underlying the imported dataset is not available, the imported dataset is still listed in the Library page, since it is just a reference to the data.
To create a new imported dataset, click Import Data. See Import Data Page.
For large relational or Parquet datasets, you can monitor the import process through the Library page.
- During the import process, you can hover over the icon for a pending dataset to track status.
- Click the icon for additional details. See Dataset Details Page.
For more information, see Overview of Job Monitoring.
Filter by type:
Click one of the pre-defined filters to show datasets of the following types:
All Data: All imported datasets or references available to the current user.
Imported Datasets: Datasets that you have imported into Dataprep by Trifacta®.
- The Source column indicates where the original source data is located.
- You can also access datasets that were imported through a configured connection. For more information, see Connections Page.
References: Objects that you have created from your recipes that can be referenced in another flow as a dataset.
Macros: Sequences of steps that can be reused in other recipes. See Macros Page.
Filter by ownership:
For the selected object type, you can filter based on the ownership of the object:
- All: All objects of the selected type to which you have access.
- Owned by me: All objects of the selected type that you own.
- Shared with me: All objects of the type that have been shared with you.
- Name: Name of the object.
- In Flows: Count of flows in which the object is in use.
- Source: Flow or datastore where the object is located.
- Last Updated: Timestamp of the last time that the object was modified.
- Browse: If displayed, use the page browsing controls to explore the available objects.
- Search: To search object names, enter a string in the search bar. Results are highlighted immediately in the Library page.
- Sort: Click a column header to sort the display by the column's entries.
Hover over an object to reveal these actions on the right side of the screen.
- Details: Review details about the dataset. See Dataset Details Page.
Preview: Inspect a preview of the dataset.
NOTE: Preview is not available for binary format sources.
- Use in new Flow: (Imported dataset only) You can create a new flow and begin immediately wrangling the dataset. This step also creates a recipe in the flow.
- Add to Flow: Add the dataset to a new or existing flow.
Make a copy: Create a copy of the imported dataset. This option is not available for reference datasets.
- Edit name and description: Change the name and description of the dataset.
Edit data settings: If the source of the imported dataset required conversion to an internally supported format, you can modify settings related to that conversion process. For more information, see File Import Settings.
Tip: This setting applies primarily to binary file formats, such as PDF and Excel, or file formats that may require additional steps to convert into tabular data, such as JSON.
Delete Dataset: Delete the dataset.
Deleting a dataset cannot be undone.
Refresh Dataset: If available, this option refreshes the dataset's metadata with the latest source schema.
NOTE: If you attempt to refresh the schema of a parameterized dataset based on a set of files, only the schema for the first file is checked for changes. If changes are detected, the other files are contain those changes as well. This can lead to changes being assumed or undetected in later files and potential data corruption in the flow.
For more information, see Overview of Schema Management.
This page has no comments.