spark基础 – IT 笔记

懒人版（基础）：

下载，解压，粘贴以下代码即可运行成功（example里面的文件）

bin/spark-submit \
–class org.apache.spark.examples.SparkPi \
–executor-memory 1G \
–total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100

基本语法
bin/spark-submit \
–class <main-class>
–master <master-url> \
–deploy-mode <deploy-mode> \
–conf <key>=<value> \
… # other options
<application-jar> \
[application-arguments]

参数说明：
–master 指定 Master 的地址，默认为 Local
–class: 你的应用的启动类 (如 org.apache.spark.examples.SparkPi)
–deploy-mode: 是否发布你的驱动到 worker 节点(cluster) 或者作为一个本地客户端
(client) (default: client)*
–conf: 任意的 Spark 配置属性，格式 key=value. 如果值包含空格，可以加引号
“key=value”
application-jar: 打包好的应用 jar,包含依赖. 这个 URL 在集群中全局可见。比如 hdfs://
共享存储系统，如果是 file:// path，那么所有的节点的 path 都包含同样的 jar
application-arguments: 传给 main()方法的参数
–executor-memory 1G 指定每个 executor 可用内存为 1G
–total-executor-cores 2 指定每个 executor 使用的 cup 核数为 2 个

示例程序

sc.textFile(“/opt/module/spark/input/word.txt”).flatMap(_.split(” “)).map((_,1)).reduceByKey(_+_).collect