After setting up spark 1.6.0 locally, in addition to using spark submit to submit Python programs, we can use pycharm IDE for development and debugging locally to improve our development efficiency. The configuration process is also very simple, which can be found on stackoverflow. At the same time, IntelliJ idea can also use Python to develop Spark Program after adding Python plug-in, and the configuration steps are consistent.
My blog original address link:http://blog.tomgou.xyz/shi-yong-pycharmpei-zhi-sparkde-pythonkai-fa-huan-jing.html
0. Install pycharm and py4j
My system environment (Ubuntu 14.04.4 LTS)
Download and install the latest version of pycharm, official website address:https://www.jetbrains.com/pycharm/download/ 。
Unpack the pycharm-5.0.4.tar.gz using the following command:
tar xfz pycharm-5.0.4.tar.gz
Run pycharm.sh from the bin subdirectory
$ sudo pip install py4j
1. Configure pychar
Open pychar and create a project.
Then select “run” – > “Edit configurations” – > “environment variables”
Add spark_ Home directory and pythonpath directory.
SPARK_ HOME:Spark Installation directory
P YTHONPATH:Spark Python directory under the installation directory
2. Test pycharm
Run a small spark program to see:
"""SimpleApp""" from pyspark import SparkContext logFile = "/home/tom/spark-1.6.0/README.md" sc = SparkContext("local","Simple App") logData = sc.textFile(logFile).cache() numAs = logData.filter(lambda s: 'a' in s).count() numBs = logData.filter(lambda s: 'b' in s).count() print("Lines with a: %i, lines with b: %i"%(numAs, numBs))
Results of operation:
Lines with a: 58, lines with b: 26