1. Python support for Spark 2.4.8
min version - 3.4
max version - 3.7
2. Install pyenv - easily installs and manages different python versions
3. pyenv install 3.7.13
4. Add below properties to .bashcrc
export PYENV_ROOT=~/.pyenv
export PATH=$PATH:$PYENV_ROOT/bin
export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python:/home/mkm/.pyenv/versions/3.7.13/lib/python3.7
export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip
export PYSPARK_PYTHON=python3
export PATH=$PATH:/home/mkm/.pyenv/versions/3.7.13/lib/python3.7
eval "$(pyenv init --path)"
5. Open a new terminal
$ pyspark
Python 3.7.13 (default, Aug 5 2024, 10:41:15)
[GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
24/08/05 21:27:19 WARN util.Utils: Your hostname, mkm-virtual-machine resolves to a loopback address: 127.0.1.1; using 192.168.62.130 instead (on interface ens33)
24/08/05 21:27:19 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
24/08/05 21:27:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to Spark
Using Python version 3.7.13 (default, Aug 5 2024 10:41:15)
SparkSession available as 'spark'.
>>>
Comments
Post a Comment