Create an imported dataset from an available resource. Created dataset is owned by the authenticated user.

NOTE: Do not create an imported dataset from a file that is being used by another imported dataset. If you delete the newly created imported dataset, the file is removed, and the other dataset is corrupted. Use a new file or make a copy of the first file first.

Version:  v4

Required Permissions

Request and Response

Request Type: POST

Endpoint:

/v4/importedDatasets

Response Status Code - Success:  201 - Created

 

Examples by Type

Below, you can review the basic request body for creating imported datasets for various types of sources:

File (HDFS and S3 sources)

Request Body - HDFS file:

NOTE: The path value should not include the HDFS protocol, host, or port information. You only need to provide the path on HDFS.


NOTE: Below, the detectStructure parameter forces the to apply initial parsing steps on import.


{
  "path": "/tri-hdfs/uploads/1/4aee9852-cf92-47a8-8c6a-9ff2adeb3b4a/POS-r02.txt",
  "type": "hdfs",
  "bucket": null,
  "name": "POS-r02b.txt",
  "description": "POS-r02 - copy",
  "detectStructure": true
}

 

Request Body - S3 file:

For S3 sources, a bucket must be specified. Below, the bucket value is set to myBucket

NOTE: The path value should not include the S3 protocol, host, or port information. You only need to provide the path on S3.



{
  "path": "/tri-h26/uploads/1/343647c7-5b23-41c8-9397-b40a1ff415ea/USDA_Farmers_Market_2014.avro",
  "type": "s3",
  "bucket": "myBucket",
  "name": "USDA Farmers Market 2014b",
  "description": "USDA Farmers Market 2014 - copy"
}

Response Body - file:

Following example is for an S3 file. For an HDFS file:

{
    "id": 63,
    "size": "5053205",
    "path": "/tri-h26/uploads/1/343647c7-5b23-41c8-9397-b40a1ff415ea/USDA_Farmers_Market_2014.avro",
    "dynamicPath": null,
    "type": "s3",
    "bucket": "myBucket",
    "isSchematized": true,
    "isDynamic": false,
    "disableTypeInference": false,
    "createdAt": "2018-02-03T01:08:32.867Z",
    "updatedAt": "2018-02-03T01:08:34.185Z",
    "parsingRecipe": {
        "id": 121
    },
    "runParameters": {
        "data": []
    },
    "name": "USDA Farmers Market 2014b",
    "description": "USDA Farmers Market 2014 - copy",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "connection": null
}

Hive

Request Body - Hive:

Notes:

{
  "visible": true,
  "numFlows": 0,
  "size": -1,
  "type": "jdbc",
  "jdbcType": "TABLE",
  "jdbcPath": [
    "default"
  ],
  "jdbcTable": "farmers_market_recipe_tri",
  "columns": [
    "fmid",
    "market_name"
  ],
  "connectionId": 1,
  "name": "Farmer's Market Data"
} 

Response Example - Hive:

{
    "jdbcTable": "farmers_market_recipe_tri",
    "jdbcPath": [
        "default"
    ],
    "columns": [
        "fmid",
        "market_name"
    ],
    "filter": null,
    "raw": null,
    "id": 19,
    "size": "-1",
    "path": null,
    "dynamicPath": null,
    "type": "jdbc",
    "bucket": null,
    "isSchematized": true,
    "isDynamic": false,
    "disableTypeInference": false,
    "createdAt": "2018-02-26T19:19:33.069Z",
    "updatedAt": "2018-02-26T19:19:33.720Z",
    "parsingRecipe": {
        "id": 35
    },
    "relationalSource": {
        "relationalPath": [
            "default"
        ],
        "columns": [
            "fmid",
            "market_name"
        ],
        "filter": null,
        "raw": null,
        "id": 2,
        "tableName": "farmers_market_recipe_tri",
        "createdAt": "2018-02-26T19:19:33.074Z",
        "updatedAt": "2018-02-26T19:19:33.074Z",
        "importedDataset": {
            "id": 19
        }
    },
    "runParameters": {
        "data": []
    },
    "connection": {
        "name": "hive",
        "creator": {
            "id": 1,
            "email": "ad@example.com",
            "name": "Administrator"
        },
        "id": 1,
        "inaccessibleToUser": true
    },
    "name": "Farmer's Market Data",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "workspace": {
        "id": 1
    }
}

Relational

Request Body - Relational:

Notes:

{
  "visible": true,
  "numFlows": 0,
  "size": -1,
  "type": "jdbc",
  "jdbcType": "TABLE",
  "jdbcPath": [
    "public"
  ],
  "jdbcTable": "datasources",
  "columns": [
    "id",
    "size",
    "path"
  ],
  "connectionId": 3,
  "name": "My DB Table"
}

Response Example - Relational:

{
    "jdbcTable": "datasources",
    "jdbcPath": [
        "public"
    ],
    "columns": [
        "id",
        "size",
        "path"
    ],
    "filter": null,
    "raw": null,
    "id": 23,
    "size": "-1",
    "path": null,
    "dynamicPath": null,
    "type": "jdbc",
    "bucket": null,
    "isSchematized": true,
    "isDynamic": false,
    "disableTypeInference": false,
    "createdAt": "2018-02-26T19:32:52.898Z",
    "updatedAt": "2018-02-26T19:32:53.613Z",
    "parsingRecipe": {
        "id": 37
    },
    "relationalSource": {
        "relationalPath": [
            "public"
        ],
        "columns": [
            "id",
            "size",
            "path"
        ],
        "filter": null,
        "raw": null,
        "id": 4,
        "tableName": "datasources",
        "createdAt": "2018-02-26T19:32:52.904Z",
        "updatedAt": "2018-02-26T19:32:52.904Z",
        "importedDataset": {
            "id": 23
        }
    },
    "runParameters": {
        "data": []
    },
    "connection": {
        "name": "postgres",
        "creator": {
            "id": 1,
            "email": "ad@example.com",
            "name": "Administrator"
        },
        "id": 3,
        "inaccessibleToUser": true
    },
    "name": "My DB Table",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "workspace": {
        "id": 1
    }
}

Relational with Custom SQL Query

You can submit custom SQL queries to relational or hive connections. These custom SQLs can be used to pre-filter the data inside the database, improving performance of the query and the overall dataset. 

Request Body:

Notes:

The following example is valid for Oracle databases. Note the escaping of the double-quote marks.

NOTE: Syntax for the custom SQL query varies between relational systems. For more information on syntax examples, see Create Dataset with SQL.


{
  "visible": true,
  "numFlows": 0,
  "size": -1,
  "type": "jdbc",
  "jdbcType": "TABLE",
  "connectionId": 1,
  "raw": "SELECT * FROM `default`.`farmers_market_recipe_tri`",
  "name": "Farmer's Market Data - Custom SQL Query"
}

Response Body:

In the response, note that the source of the data is defined by the connectionId value and the SQL defined in the raw value.

{
    "jdbcTable": null,
    "jdbcPath": null,
    "columns": null,
    "filter": null,
    "raw": [
        "SELECT * FROM `default`.`farmers_market_recipe_tri`"
    ],
    "id": 21,
    "size": "-1",
    "path": null,
    "dynamicPath": null,
    "type": "jdbc",
    "bucket": null,
    "isSchematized": true,
    "isDynamic": true,
    "disableTypeInference": false,
    "createdAt": "2018-02-26T19:25:33.000Z",
    "updatedAt": "2018-02-26T19:25:33.884Z",
    "parsingRecipe": {
        "id": 36
    },
    "relationalSource": {
        "relationalPath": null,
        "columns": null,
        "filter": null,
        "raw": [
            "SELECT * FROM `default`.`farmers_market_recipe_tri`"
        ],
        "id": 3,
        "tableName": null,
        "createdAt": "2018-02-26T19:25:33.006Z",
        "updatedAt": "2018-02-26T19:25:33.006Z",
        "importedDataset": {
            "id": 21
        }
    },
    "runParameters": {
        "data": []
    },
    "connection": {
        "name": "hive",
        "creator": {
            "id": 1,
            "email": "ad@example.com",
            "name": "Administrator"
        },
        "id": 1,
        "inaccessibleToUser": true
    },
    "name": "Farmer's Market Data - Custom SQL Query",
    "creator": {
        "id": 1
    },
    "updater": {
        "id": 1
    },
    "workspace": {
        "id": 1
    }
}

Reference

For more information on the properties of an imported dataset, see API ImportedDatasets Get v4.