[Hadoop] Map&Reduce

Passion/hadoop

[Hadoop] Map&Reduce

sunshout 2011. 7. 19. 02:45

Overview:
- Map : 데이터를 key : value 쌍으로 매핑하는 작업
- Reduce : Map 을 하나의 결과로 줄이는 작업

-> Map & Reduce의 개념은 정말 심플하다. Python의 Map & reduce 함수와 원리는 동일함. 계산하고자 하는 것을 컴퓨터가 잘 이해하는 방식으로 표현하고 결과를 하나의 값으로 줄이는 것이다.

- MapReduce Framework에는 JobTracker라는 싱글매스터가 존재, 클러스터 노드에는한개의 slave TaskTracker가 존재

Input & Output process
(input) <k1, v1> --> MAP --> <k2, v2> --> combine --> <k2,v2> --> REDUCE --> <k3,v3> (output)
즉 <k2,v2> 형식의 key, value 쌍으로 매핑한 후 하나의 값으로 줄임

Map & Reduce 함수 구현
map, reduce는 두 개의 hadoop Mapper, Reducer 함수를 상속하여 구현하면 된다.

public class Mapper<K1,V1, K2, V2>
- <K1,V1>으로 받은 값을 중간 표현 값인 <K2, V2>로 표현
- Mapper 함수의 map 함수를 작성
- 예제) WordCount

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
...

public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

countext.write(k2,v2);
}

}

public class Reduder<K2, V2, K3, V3>
- intermediate 값은 <K2, V2>를 최종 값인 <K3, V3>으로 변경
- reduce 함수를 재 작성
(하나의 중간 Key 인 K2 에 대해서 Mapper 함수에서 여러 개의 values 가 존재하고, values를 줄이는 과정임)
-reduce(KEYIN key, Iterable<VALUEIN> values, Reducer.Context context)


public class IntSumReducer extends Reducer {
   private IntWritable result = new IntWritable();
 
   public void reduce(Key key, Iterable values, 
                      Context context) throws IOException {
     int sum = 0;
     for (IntWritable val : values) {
       sum += val.get();
     }
     result.set(sum);
     context.collect(key, result);
   }
 }

참조: http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html

현재글[Hadoop] Map&Reduce

가상화, HBase, 팁, 논문, Eclipse, CloudStack, 회사, Xen, OVM, 아파트, latex, ns, C, 라우터, 네트워크, Python, PyQt4, Hadoop, 미완성, 분양,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Deep dive into Kernel

[Hadoop] Map&Reduce

'Passion/hadoop'의 다른글

티스토리툴바

[Hadoop] Map&Reduce

'Passion/hadoop'의 다른글

관련글

티스토리툴바