Using pycharm to configure the python development environment of spark (basic)

Time:2021-4-8

After setting up spark 1.6.0 locally, in addition to using spark submit to submit Python programs, we can use pycharm IDE for development and debugging locally to improve our development efficiency. The configuration process is also very simple, which can be found on stackoverflow. At the same time, IntelliJ idea can also use Python to develop Spark Program after adding Python plug-in, and the configuration steps are consistent.

My blog original address link:http://blog.tomgou.xyz/shi-yong-pycharmpei-zhi-sparkde-pythonkai-fa-huan-jing.html

0. Install pycharm and py4j

My system environment (Ubuntu 14.04.4 LTS)

Download and install the latest version of pycharm, official website address:https://www.jetbrains.com/pycharm/download/

Installation steps:

  • Unpack the pycharm-5.0.4.tar.gz using the following command: tar xfz pycharm-5.0.4.tar.gz

  • Run pycharm.sh from the bin subdirectory

Install py4j:

$ sudo pip install py4j

1. Configure pychar

Open pychar and create a project.
Then select “run” – > “Edit configurations” – > “environment variables”
Using pycharm to configure the python development environment of spark (basic)
Add spark_ Home directory and pythonpath directory.

  • SPARK_ HOME:Spark Installation directory

  • P YTHONPATH:Spark Python directory under the installation directory
    Using pycharm to configure the python development environment of spark (basic)


2. Test pycharm

Run a small spark program to see:

"""SimpleApp"""

from pyspark import SparkContext

logFile = "/home/tom/spark-1.6.0/README.md"
sc = SparkContext("local","Simple App")
logData = sc.textFile(logFile).cache()

numAs = logData.filter(lambda s: 'a' in s).count()
numBs = logData.filter(lambda s: 'b' in s).count()

print("Lines with a: %i, lines with b: %i"%(numAs, numBs))

Results of operation:

Lines with a: 58, lines with b: 26