spark日常填坑

spark日常填坑 | 沐雨浥尘

.use-motion .motion-element, .use-motion .brand, .use-motion .menu-item, .sidebar-inner, .use-motion .post-block, .use-motion .pagination, .use-motion .comments, .use-motion .post-header, .use-motion .post-body, .use-motion .collection-title { opacity: initial; } .use-motion .logo, .use-motion .site-title, .use-motion .site-subtitle { opacity: initial; top: initial; } .use-motion { .logo-line-before i { left: initial; } .logo-line-after i { right: initial; } }

spark日常填坑

发表于 2017-07-17 | 更新于 2019-05-09 | 阅读次数：

本文字数： 2.4k | 阅读时长 ≈ 2 分钟

spark基础使用

集群spark使用

jupyter notebook

jupyter notebook安装

ImportError: No module named pyspark

原因是没有添加环境变量

# spark 1.6.0
export SPARK_HOME=/usr/local/spark
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip
# 注意py4j的版本对应

first try

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('YOURNAME').setMaster('spark://mu01:7077').set('spark.executor.memory', '4G').set('spark.cores.max', '80')
sc = SparkContext(conf=conf)
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print distData.first()

Buy me a cup of coffee

本文作者： zydarChen
本文链接： https://www.zydarchen.top/20170717/7_debug_spark/
版权声明： 本博客所有文章除特别声明外，均采用 BY-NC-SA 许可协议。转载请注明出处！

zydarChen

平淡无奇、追海贼、爱艺青的写代码少年

GitHub E-Mail WeChat

1. 集群spark使用
1. 1.1. jupyter notebook