There are different kind of systems, each with different merits.
A service wait for requests or instructions from a client to arrive, and once arrived, the service tries to handle it as quickly as possible to send the response back.
One key performance measurement is response time.
A batch processing system takes a large amount of input data, runs a job to process it and produces some output data. Jobs often take a while and are often scheduled to run periodically.
One key performance measurement is throughput
Stream processing is somewhere between online and offline/batch processing. Like a batch processing system, stream processor consumes inputs and produces output. But a stream job operates on events shortly after they happen, whereas a batch job operates on a fixed set of input data.
The unix approach to process data is easy and efficient. Each unix program like awk
, sort
has the same interface and thus making data easy to be piped along each of them. All of the easiness comes from the unix philosophy
In addition, unix has uniform interface, separation of logic and wiring
MapReduce is like Unix tools, but distributed across potentially thousands of machines.