Create an imported dataset from an available resource. Created dataset is owned by the authenticated user.

NOTE: When an imported dataset is created via API, it is always imported as an unstructured dataset. Any recipe that references this dataset should contain initial parsing steps required to structure the data.


NOTE: Do not create an imported dataset from a file that is being used by another imported dataset. If you delete the newly created imported dataset, the file is removed, and the other dataset is corrupted. Use a new file or make a copy of the first file first.

Version:  v4

Required Permissions

Request and Response

Request Type: POST

Endpoint:

/v4/importedDatasets

Response Status Code - Success:  201 - Created

 

Examples by Type

Below, you can review the basic request body for creating imported datasets for various types of sources:

File (HDFS and S3 sources)

Request Body - HDFS file:

Below, the bucket value is set to null. This parameter applies only to S3 sources.

NOTE: The path value should not include the HDFS protocol, host, or port information. You only need to provide the path on HDFS.


{
  "path": "/tri-hdfs/uploads/1/4aee9852-cf92-47a8-8c6a-9ff2adeb3b4a/POS-r02.txt",
  "type": "hdfs",
  "bucket": null,
  "name": "POS-r02b.txt",
  "description": "POS-r02 - copy"
}

 

Request Body - S3 file:

For S3 sources, a bucket must be specified. Below, the bucket value is set to myBucket

NOTE: The path value should not include the S3 protocol, host, or port information. You only need to provide the path on S3.



{
  "path": "/tri-h26/uploads/1/343647c7-5b23-41c8-9397-b40a1ff415ea/USDA_Farmers_Market_2014.avro",
  "type": "s3",
  "bucket": "myBucket",
  "name": "USDA Farmers Market 2014b",
  "description": "USDA Farmers Market 2014 - copy"
}

Response Body - file:

Following example is for an S3 file. For an HDFS file:

{
    "id": 63,
    "size": "5053205",
    "path": "/tri-h26/uploads/1/343647c7-5b23-41c8-9397-b40a1ff415ea/USDA_Farmers_Market_2014.avro",
    "dynamicPath": null,
    "type": "s3",
    "bucket": "myBucket",
    "isSchematized": true,
    "isDynamic": false,
    "disableTypeInference": false,
    "createdAt": "2018-02-03T01:08:32.867Z",
    "updatedAt": "2018-02-03T01:08:34.185Z",
    "parsingRecipe": {
        "id": 121
    },
    "runParameters": {
        "data": []
    },
    "name": "USDA Farmers Market 2014b",
    "description": "USDA Farmers Market 2014 - copy",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "connection": null
}

Hive

Request Body - Hive:

Notes:

{
  "visible": true,
  "numFlows": 0,
  "size": -1,
  "type": "jdbc",
  "jdbcType": "TABLE",
  "jdbcPath": [
    "default"
  ],
  "jdbcTable": "farmers_market_recipe_tri",
  "columns": [
    "fmid",
    "market_name"
  ],
  "connectionId": 1,
  "name": "Farmer's Market Data"
} 

Response Example - Hive:

{
    "jdbcTable": "farmers_market_recipe_tri",
    "jdbcPath": [
        "default"
    ],
    "columns": [
        "fmid",
        "market_name"
    ],
    "filter": null,
    "raw": null,
    "id": 19,
    "size": "-1",
    "path": null,
    "dynamicPath": null,
    "type": "jdbc",
    "bucket": null,
    "isSchematized": true,
    "isDynamic": false,
    "disableTypeInference": false,
    "createdAt": "2018-02-26T19:19:33.069Z",
    "updatedAt": "2018-02-26T19:19:33.720Z",
    "parsingRecipe": {
        "id": 35
    },
    "relationalSource": {
        "relationalPath": [
            "default"
        ],
        "columns": [
            "fmid",
            "market_name"
        ],
        "filter": null,
        "raw": null,
        "id": 2,
        "tableName": "farmers_market_recipe_tri",
        "createdAt": "2018-02-26T19:19:33.074Z",
        "updatedAt": "2018-02-26T19:19:33.074Z",
        "importedDataset": {
            "id": 19
        }
    },
    "runParameters": {
        "data": []
    },
    "connection": {
        "name": "hive",
        "creator": {
            "id": 1,
            "email": "ad@example.com",
            "name": "Administrator"
        },
        "id": 1,
        "inaccessibleToUser": true
    },
    "name": "Farmer's Market Data",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "workspace": {
        "id": 1
    }
}

Relational

Request Body - Relational:

Notes:

{
  "visible": true,
  "numFlows": 0,
  "size": -1,
  "type": "jdbc",
  "jdbcType": "TABLE",
  "jdbcPath": [
    "public"
  ],
  "jdbcTable": "datasources",
  "columns": [
    "id",
    "size",
    "path"
  ],
  "connectionId": 3,
  "name": "My DB Table"
}

Response Example - Relational:

{
    "jdbcTable": "datasources",
    "jdbcPath": [
        "public"
    ],
    "columns": [
        "id",
        "size",
        "path"
    ],
    "filter": null,
    "raw": null,
    "id": 23,
    "size": "-1",
    "path": null,
    "dynamicPath": null,
    "type": "jdbc",
    "bucket": null,
    "isSchematized": true,
    "isDynamic": false,
    "disableTypeInference": false,
    "createdAt": "2018-02-26T19:32:52.898Z",
    "updatedAt": "2018-02-26T19:32:53.613Z",
    "parsingRecipe": {
        "id": 37
    },
    "relationalSource": {
        "relationalPath": [
            "public"
        ],
        "columns": [
            "id",
            "size",
            "path"
        ],
        "filter": null,
        "raw": null,
        "id": 4,
        "tableName": "datasources",
        "createdAt": "2018-02-26T19:32:52.904Z",
        "updatedAt": "2018-02-26T19:32:52.904Z",
        "importedDataset": {
            "id": 23
        }
    },
    "runParameters": {
        "data": []
    },
    "connection": {
        "name": "postgres",
        "creator": {
            "id": 1,
            "email": "ad@example.com",
            "name": "Administrator"
        },
        "id": 3,
        "inaccessibleToUser": true
    },
    "name": "My DB Table",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "workspace": {
        "id": 1
    }
}

Relational with Custom SQL Query

You can submit custom SQL queries to relational or hive connections. These custom SQLs can be used to pre-filter the data inside the database, improving performance of the query and the overall dataset. 

Request Body:

Notes:

The following example is valid for Oracle databases. Note the escaping of the double-quote marks.

NOTE: Syntax for the custom SQL query varies between relational systems. For more information on syntax examples, see Create Dataset with SQL.


{
  "visible": true,
  "numFlows": 0,
  "size": -1,
  "type": "jdbc",
  "jdbcType": "TABLE",
  "connectionId": 1,
  "raw": "SELECT * FROM `default`.`farmers_market_recipe_tri`",
  "name": "Farmer's Market Data - Custom SQL Query"
}

Response Body:

In the response, note that the source of the data is defined by the connectionId value and the SQL defined in the raw value.

{
    "jdbcTable": null,
    "jdbcPath": null,
    "columns": null,
    "filter": null,
    "raw": [
        "SELECT * FROM `default`.`farmers_market_recipe_tri`"
    ],
    "id": 21,
    "size": "-1",
    "path": null,
    "dynamicPath": null,
    "type": "jdbc",
    "bucket": null,
    "isSchematized": true,
    "isDynamic": true,
    "disableTypeInference": false,
    "createdAt": "2018-02-26T19:25:33.000Z",
    "updatedAt": "2018-02-26T19:25:33.884Z",
    "parsingRecipe": {
        "id": 36
    },
    "relationalSource": {
        "relationalPath": null,
        "columns": null,
        "filter": null,
        "raw": [
            "SELECT * FROM `default`.`farmers_market_recipe_tri`"
        ],
        "id": 3,
        "tableName": null,
        "createdAt": "2018-02-26T19:25:33.006Z",
        "updatedAt": "2018-02-26T19:25:33.006Z",
        "importedDataset": {
            "id": 21
        }
    },
    "runParameters": {
        "data": []
    },
    "connection": {
        "name": "hive",
        "creator": {
            "id": 1,
            "email": "ad@example.com",
            "name": "Administrator"
        },
        "id": 1,
        "inaccessibleToUser": true
    },
    "name": "Farmer's Market Data - Custom SQL Query",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "workspace": {
        "id": 1
    }
}

Reference

For more information on the properties of an imported dataset, see API ImportedDatasets Get v4.