scala两种环境备忘

IDEA环境:

package wordcount

import org.apache.spark.{SparkConf, SparkContext}

object wordCountScala extends App{
val conf = new SparkConf().setAppName("Wordcount").setMaster("local");  //可删掉该setMaster
 val sc = new SparkContext(conf)

 val line = sc.textFile("D:\\win7远程\\14期大数据潭州课件\\第三阶段:实时开发(plus)\\2020-0105-Spark-SQL\\数据\\wordcount.txt")

 val result = line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)

 result.foreach(println)
}

HDFS环境:

package wordcount

import org.apache.spark.{SparkConf, SparkContext}

object wordCountScala_HDFS extends App{
val conf = new SparkConf().setAppName("Wordcount");//可删掉该setMaster
 val sc = new SparkContext(conf)

 val line = sc.textFile("hdfs://bigdata166:9000/testdata/wordcount.txt")

 val result = line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_)

 result.foreach(println)
 result.collect()
}

 

[root@bigdata166 bin]# ./spark-submit --master spark://bigdata166:7077 --class wordcount.wordCountScala_HDFS ../testjar/scalaTest-1.0-SNAPSHOT.jar

 

其它的坑:

有时候不关sparkshell的时候会导致内存不足,控制台会提示查看UI页面。

加collect输出到日志,print输出到outstd好像

 

留下评论

您的邮箱地址不会被公开。 必填项已用 * 标注