NameError: name 'sc' is not defined - Fixed
Beginners finds this error while submitting pyspark job using spark-submit tool. In this section we are going to see why this error is coming and what is the solution for this.
See the error carefully:
NameError: name 'sc' is not defined
This is saying that the 'sc' is not defined in the program and due to this program can't be executed. So, in your pyspark program you have to first define SparkContext and store the object in a variable called 'sc'. By default developers are using the name 'sc' for SparkContext object, but if you whish you can change variable name of your choice.
Let's see what is 'sc'?
The object sc refers to SparkContext object in PySpark. This object is used to perform operations over the spark cluster.
The of SparkContext is the main entry point to run any Spark program. This is the connection to the Spark cluster for creating RDD and working with the accumulators/performing any sort of processing over the Spark Cluster.
Here is the screen shot of spark-submit which throws error if sc not found:
To resolve this error add following code in your program:
from pyspark import SparkContext
Then create 'sc' with following code:
sc = SparkContext("local", "Hello World App")
This properly initialize sc object and your program will run without any issue.
Check more tutorials at: PySpark Tutorials - Learning PySpark from beginning.