It offers services for streaming logs into Hadoop
It is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data into HDFS
YARN coordinates data ingest from Apache Flume and other services that deliver raw data into an Enterprise Hadoop cluster.
It supports the online analytic application by a simple and extensible data model
It is highly fault-tolerant, reliable, manageable, scalable, and customizable.
Collecting data from multiple servers in real-time as well as in batch mode.
Huge volumes of data can be easily imported and analyzed in real-time.
It can collect data from a large set of sources and move them to multiple destinations.
Multi-hop flows, fan-in fan-out flows, contextual routing, etc
Scale horizontally.