MapReduce Client
MapReduce is the key algorithm that the Hadoop MapReduce engine uses to distribute work around a cluster. The key aspect of the MapReduce algorithm is that if every Map and Reduce is independent of all other ongoing Maps and Reduces, then the operation can be run in parallel on different keys and lists of data. On a large cluster of machines, you can go one step further, and run the Map operations on servers where the data lives. Rather than copy the data over the network to the program, you push out the program to the machines. The output list can then be saved to the distributed filesystem, and the reducers run to merge the results. Again, it may be possible to run these in parallel, each reducing different keys.
module.exports =
deps:
iptables: module: 'masson/core/iptables', local: true
krb5_client: module: 'masson/core/krb5_client', local: true
java: module: 'masson/commons/java', local: true
test_user: module: 'ryba/commons/test_user', local: true, auto: true
hadoop_core: module: 'ryba/hadoop/core', local: true, auto: true, implicit: true
hdfs_client: module: 'ryba/hadoop/hdfs_client', required: true
yarn_client: module: 'ryba/hadoop/yarn_client', required: true
yarn_nm: module: 'ryba/hadoop/yarn_nm', required: true
yarn_rm: module: 'ryba/hadoop/yarn_rm', required: true
yarn_tr: module: 'ryba/hadoop/yarn_tr'
yarn_ts: module: 'ryba/hadoop/yarn_ts', single: true
mapred_jhs: module: 'ryba/hadoop/mapred_jhs', single: true
configure:
'ryba/hadoop/mapred_client/configure'
commands:
'check':
'ryba/hadoop/mapred_client/check'
'report': [
'masson/bootstrap/report'
'ryba/hadoop/mapred_client/report'
]
'install': [
'ryba/hadoop/mapred_client/install'
'ryba/hadoop/mapred_client/check'
]