You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 25, 2020. It is now read-only.
Kafka is a large, complex piece of software which requires installation and maintenance. There are many ways for Kafka to fail, and Kafka requires ongoing management in order to prevent disk overflow and make tradeoffs between recoverability and resource usage.
While Kafka is very appropriate for a large scale distributed ingest system which has to keep up with fluctuating loads and be fully redundant, it is less appropriate for a single node analytics engine like Precog. When Precog becomes distributed, the focus will be on reading data from HDFS, and not on the ingest of that data, so even long-term, the direct use of Kafka in the Precog project is an unnecessary distraction.
In order to simplify the number of moving pieces in Precog, Kafka needs to be eliminated as a dependency.
Ingest can be as simple as batching up a chunk of data and writing it out to the (abstract) file system -- e.g. appending to the relevant file.
This ticket will be considered complete when Kafka is not a dependency of the project nor referenced or utilized anywhere in the source code, unit tests, or documentation.
This is very easy to achieve; simply create a new subproject that derives
from the ingest and bifrost projects, and when you mix the cake together
replace the KafkaEventStore with an EventStore implementation that passes
messages directly to the routing actor, and exclude the
KafkaShardIngestActor from the cake entirely..
Kafka is a large, complex piece of software which requires installation
and maintenance. There are many ways for Kafka to fail, and Kafka requires
ongoing management in order to prevent disk overflow and make tradeoffs
between recoverability and resource usage.
While Kafka is very appropriate for a large scale distributed ingest
system which has to keep up with fluctuating loads and be fully redundant,
it is less appropriate for a single node analytics engine like Precog. When
Precog becomes distributed, the focus will be on reading data from HDFS,
and not on the ingest of that data, so even long-term, the direct use of
Kafka in the Precog project is an unnecessary distraction.
In order to simplify the number of moving pieces in Precog, Kafka needs to
be eliminated as a dependency.
Ingest can be as simple as batching up a chunk of data and writing it out
to the (abstract) file system -- e.g. appending to the relevant file.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/524
.
Kafka is a large, complex piece of software which requires installation and maintenance. There are many ways for Kafka to fail, and Kafka requires ongoing management in order to prevent disk overflow and make tradeoffs between recoverability and resource usage.
While Kafka is very appropriate for a large scale distributed ingest system which has to keep up with fluctuating loads and be fully redundant, it is less appropriate for a single node analytics engine like Precog. When Precog becomes distributed, the focus will be on reading data from HDFS, and not on the ingest of that data, so even long-term, the direct use of Kafka in the Precog project is an unnecessary distraction.
In order to simplify the number of moving pieces in Precog, Kafka needs to be eliminated as a dependency.
Ingest can be as simple as batching up a chunk of data and writing it out to the (abstract) file system -- e.g. appending to the relevant file.
This ticket will be considered complete when Kafka is not a dependency of the project nor referenced or utilized anywhere in the source code, unit tests, or documentation.
See @nuttycom's comment below.
The text was updated successfully, but these errors were encountered: