Skip to content

Latest commit

 

History

History
46 lines (45 loc) · 2.47 KB

TODO.md

File metadata and controls

46 lines (45 loc) · 2.47 KB

TODO:

  • Figure out how to pass ctor params (currently db host, port, etc) to BinLogConsumer(s)
  • Make binlog position saving (currently to file) modular
  • Create a tool that facilitates altering a MySQL table and pushes the corresponding Avro schema to the repository.
  • If the MySQL tables have less fields then the Avro schema we get an NPE, only set fields that are present, and attempt to set some default values on fields that we do not get from the database
  • Add ability to create and register avro schemas dynamically as tables change
  • Expose the stdout producer through a some application that accepts the hostname, port, etc. and simply tails the binlog.
  • Add metrics into various parts of the pipeline.
  • HBase producer
  • HDFS producer (through HFile?)
  • Add option to ignore current saved location and pick up from current master location.
  • If we fall behind replication there is no way for us to detect that and recover if we are resuming from a position saved to disk.
  • Close db connections on exit

DONE:

  • Use single topic per table for the Kafka generic producer
  • Logging format should be controlled through config file, not code
  • Create Kafka consumer
  • Periodically refresh MySQL table metadata -> instead, handle ALTER events. This is taken care of since we cache tables by MySQL table ID which will chanage when the tables meta data changes.
  • Split off Cassandra producer into mypipe-cassandra
  • Add Kafka 0.8 producer
  • Do not use TableMapEventData in the MySQLMetadataManager (third party API leak into ours)
  • MySQLSpec should create the user table if it is not present.
  • Add MySQLMetadataManagerSpec
  • Split out MySQL table metadata fetcher from BinlogConumer
  • Add a latency test / gauge used to tell us how long it takes for an event to enter the MySQL database and get acted upon by a producer (queued, not flushed).
  • Make pipe name part of binlog position file name on disk (currently only producer name is in there)
  • Error handling when a producer fails
  • Move producers outside BinlogConsumer and into Pipe
  • Split into multi-project build (api, samples, producers, mypipe)
  • Create a test that will consume information and write it a file, then validates it. The test can issue the queries too.
  • Create pipes, where we join a bunch of consumers to a producer (tracks its own binlog progress)
  • If binlog position does not move there is no need to flush it to disk
  • Create stdout producer
  • Create ColumnMetadata type and make it part of Column and use it in Table