...
- Good for 100's of MBs. Not good for tables of GB size.
- 1 ingest job per source, meaning a dataset with 3 sources = 3 ingest jobs.
Rule of thumb for max concurrent jobs for a similar edge node:
Code Block max concurrent sources = max cores - cores used for services
- Above is valid until the network becomes a bottleneck. Internally, the above maxed out at about 15 concurrent sources.
- Default concurrent jobs = 16, pool size of 10, 2 minute timeout on pool. This is to prevent overloading of your database.
- Adding more concurrent jobs once network has bottleneck will start slow down all the transfer jobs simultaneously.
- If processing is fully saturated (# of workers is maxed):
- max transfer can drop to 1/3 GB/minute.
- Ingest waits for two minutes to acquire a connection. If after two minutes a connection cannot be acquired, the job fails.
- When job is queued for processing:
- Job is silently queued and appears to be in progress.
- Service waits until other jobs complete.
- Currently, there is no timeout for queueing based on the maximum number of concurrent ingest jobs.
Limitations
- JDBC ingest caching is not supported for Hive.
Enable
To enable JDBC ingestion and performance caching, both of the following parameters must be enabled.
Info |
---|
NOTE: For new installations, this feature is enabled by default. For customers upgrading to Release 5.1 and later, this feature is disabled by default. |
D s config |
---|
Parameter Name | Description | ||
---|---|---|---|
webapp.connectivity.ingest.enabled | Enables JDBC ingestion. Default is true . | ||
feature.jdbcIngestionCaching.enabled | Enables caching of ingested JDBC data.
| ||
feature.enableLongLoading | When enabled, you can monitor the ingestion of long-loading JDBC datasets through the Import Data page. Default is
|
...