There are different kind of systems, each with different merits.

Services(online systems)

A service wait for requests or instructions from a client to arrive, and once arrived, the service tries to handle it as quickly as possible to send the response back.

One key performance measurement is response time.

Batch Processing Systems(offline systems)

A batch processing system takes a large amount of input data, runs a job to process it and produces some output data. Jobs often take a while and are often scheduled to run periodically.

One key performance measurement is throughput

Stream processing systems (near-real-time systems)

Stream processing is somewhere between online and offline/batch processing. Like a batch processing system, stream processor consumes inputs and produces output. But a stream job operates on events shortly after they happen, whereas a batch job operates on a fixed set of input data.


The Unix Philosophy

The unix approach to process data is easy and efficient. Each unix program like awk , sort has the same interface and thus making data easy to be piped along each of them. All of the easiness comes from the unix philosophy

  1. Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”
  2. Expect the output of every program to become the input to another, as yet unknown, program. Avoid stringently columnar or binary input formats.
  3. Design and build software, even operating systems, to be tried early, ideally within weeks. Don’t hesitate to throw away the clumsy parts and rebuild them
  4. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you finish using them.

In addition, unix has uniform interface, separation of logic and wiring

MapReduce and Distributed Filesystems

Map Reduce

MapReduce is like Unix tools, but distributed across potentially thousands of machines.