...
Tip |
---|
Tip: If your sources have variation in structure, you should remove the structure from the imported dataset and create your own initial parsing steps to account for the variations. See Initial Parsing Steps. |
Mismatched Schemas
D s product |
---|
If schemas do not match:
- When the first dataset contains extra columns at the end, the subsequent datasets that match should import without issues.
- If the subsequent datasets contain extra columns at the end, the datasets may import. Depending on the situation, there may be issues.
- If the subsequent datasets have additional or missing columns in the middle of the dataset, results of the import are unpredictable.
- If there are extra columns in the middle of the dataset, you may see extra data in the final column, in which the spill-over data has not been split.
- Ideally, you should fix these issues in the source of the data. But if you cannot, you can try the following:
Tips:
- After import of a dataset with parameters, perform a full scan random sample. When the new sample is selected:
- Check the last column of your imported to see if you have multiple columns of data. See if you can perform split the columns yourself.
- Scan the column histograms to see if there are columns where the number of mismatches or anomalous or outlier values has suddenly increased. This could be a sign of mismatches in the schemas.
- Check the last column of your imported to see if you have multiple columns of data. See if you can perform split the columns yourself.
- Edit the dataset with parameters. Review the parameter definition. Click Update to re-infer the data types of the schemas. This step may address some issues.
- You can use the union tool to import the oldest and most recent sources in your dataset with parameters. If you see variations in the schema, you can look to modify the sources to match.
- If your sources have variation in structure, you should remove the structure from the imported dataset and create your own initial parsing steps to account for the variations. See 148814315.
Limitations
- You cannot create datasets with parameters from uploaded data.
- You cannot create dataset with parameters from multiple file types.
- File extensions can be parameterized. Mixing of file types (e.g. TXT and CSV) only works if they are processed in an identical manner, which is rare.
- You cannot create parameters across text and binary file types.
- .
- Parameter and variable names can be up to 255 characters in length.
- For regular expression patterns, the following reference types are not supported due to the length of time to evaluate:
Backreferences. The following example matches on
axa
,bxb
, andcxc
yet generates an error:Code Block ([a-c])x\1
Lookahead assertions: The following example matches on
a
, but only when it is part of anab
pattern. It generates an error:Code Block a(?=b)
- For some source file types, such as Parquet, the schemas between source files must match exactly.
...