spark standalone 클러스터 모드에서 어플리케이션 실행

카테고리 없음

spark standalone 클러스터 모드에서 어플리케이션 실행

batterflyyin 2019. 1. 16. 14:27

- Spark 2.2.2 버전 기준입니다.

- 네트워크 방화벽 설정 및 호스트네임 서버 설정을 완료

1. conf/slaves에 slave 서버의 호스트 네임 또는 아이피주소를 입력해준다.

2. sbin/start-master.sh 로 시작

3. master 아이피의 8080포트로 이동하면 web UI 확인 가능

4. 최상단 제목 부분에서 spark://spark-master ... 로 시작하는 주소를 가지고 애플리케이션을 실행해야 한다.

5-1 spark shell을 구동하려면

./bin/spark-shell --master spark://IP:PORT

5-2 운영용 애플리케이션을 실행 하려면

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

--class: 어플리케이션 시작 위치 (e.g. org.apache.spark.examples.SparkPi)
--master: 클러스터에 속한 master 주소 (e.g. spark://23.195.26.187:7077)
--deploy-mode: 워커 노드에 드라이버를 할당할 것인지 (cluster) 하지 않고 로컬로만 실행(client)
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: 실행할 어플리케이션과 dependencies를 모두 포함한 jar파일 위치이다. Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
application-arguments: Arguments passed to the main method of your main class, if any

예제

# Run application locally on 8 cores 
# 1대의 로컬장비에서 실행 
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \
  /path/to/examples.jar \
  100

# Run on a Spark standalone cluster in client deploy mode 
# deploymode를 cluster로 실행
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
# 클러스터 모드로 실행 하되 감독(supervise) 옵션추가
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  /path/to/examples.jar \
  1000

# Run on a YARN cluster 
# YARN 클러스터 모드로 실행
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn \
  --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

# Run a Python application on a Spark standalone cluster
# 파이썬 어플리케이션 실행
./bin/spark-submit \
  --master spark://207.184.161.138:7077 \
  examples/src/main/python/pi.py \
  1000

# Run on a Mesos cluster in cluster deploy mode with supervise
# 메소스 클러스터로 실행하기
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master mesos://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  --executor-memory 20G \
  --total-executor-cores 100 \
  http://path/to/examples.jar \
  1000