spark日常填坑 | 沐雨浥尘

spark日常填坑

spark基础使用

集群spark使用

jupyter notebook

  • jupyter notebook安装
  • ImportError: No module named pyspark

    • 原因是没有添加环境变量
      1
      2
      3
      4
      # spark 1.6.0
      export SPARK_HOME=/usr/local/spark
      export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip
      # 注意py4j的版本对应
  • first try

    1
    2
    3
    4
    5
    6
    from pyspark import SparkContext, SparkConf
    conf = SparkConf().setAppName('YOURNAME').setMaster('spark://mu01:7077').set('spark.executor.memory', '4G').set('spark.cores.max', '80')
    sc = SparkContext(conf=conf)
    data = [1, 2, 3, 4, 5]
    distData = sc.parallelize(data)
    print distData.first()
Buy me a cup of coffee