-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CsvSource type conversion with custom schema #370
Comments
@flexik Have you considered using the SchemaInferrer? val inferrer = SchemaInferrer(SchemaType.String, SchemaRule("qty", SchemaType.Int, false), SchemaRule(".*_id", SchemaType.Int))
CsvSource("myfile").withSchemaInferrer(inferrer) I take your point though that perhaps if you explicitly pass in the schema it should use the schema under-the-hold - we will be looking into this. |
@hannesmiller Using Updated example with import java.io.ByteArrayInputStream
import java.nio.charset.StandardCharsets
import io.eels.{DataTypeRule, SchemaInferrer}
import io.eels.component.csv.CsvSource
import io.eels.schema._
object CsvSourceTypeConversionTest extends App {
val exampleCsvString =
"""A,B,C,D
|1,2.2,3,foo
|4,5.5,6,bar
""".stripMargin
def stream = new ByteArrayInputStream(exampleCsvString.getBytes(StandardCharsets.UTF_8))
val inferrer = SchemaInferrer(
StringType,
DataTypeRule("A", IntType.Signed),
DataTypeRule("B", DoubleType),
DataTypeRule("C", IntType.Signed),
DataTypeRule("D", StringType)
)
val ds = new CsvSource(stream _).withSchemaInferrer(inferrer).toDataStream()
val firstRow = ds.iterator.toIterable.head
val firstRowA = firstRow.get("A")
println(firstRowA) // prints 1 as expected
println(firstRowA.getClass.getTypeName) // prints java.lang.String
assert(firstRowA == 1) // this assertion will fail because firstRowA is not an Int
} |
@flexik ok this maybe a bug that was introduced between versions - nevertheless I agree this should match the supplied schema - we will make this a priority for the next release which is looking like the early part of March. Will keep you posted If we manage to get this resolved in alpha release beforehand. Regards |
From the project README - CSV source part I got the idea that type conversion for loaded CSV should be performed according to the specified schema.
But if I define a custom schema for a
CsvSource
which has columns with other types thanString
(Int
for example), then the values in that column are still returned asString
.Is it intended behaviour, bug or it just haven't been implemented?
Runnable example:
The text was updated successfully, but these errors were encountered: