ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [Spark] The case sensitive option of spark
    Open Source/Spark 2020. 10. 21. 12:56
    반응형

    Used spark version: 2.4.4

    The column of spark DataFrame can be selected regardless of case.

    Because the spark uses case-sensitive option (spark.sql.caseSensitivedefault is False).

     

    1. Example of case-sensitive option is False (default)

    • You can select columns ignoring the case of column name.
    df = spark.createDataFrame([{'ABC': 1, 'abc': 2}])
    df.show()
     
    '''
    output:
    +---+---+
    |ABC|abc|
    +---+---+
    |  1|  2|
    +---+---+
    '''
     
    df.select('ABC').show()
     
    '''
    output:
    Traceback (most recent call last):
        ...
    pyspark.sql.utils.AnalysisException: "Reference 'ABC' is ambiguous, could be: ABC, ABC.;"
    '''

    2. Example of case-sensitive option is True

    • You can select exactly the column name of DataFrame.
    df = spark.createDataFrame([{'ABC': 1, 'abc': 2}])
    df.show()
     
    '''
    output:
    +---+---+
    |ABC|abc|
    +---+---+
    |  1|  2|
    +---+---+
    '''
     
    spark.conf.set('spark.sql.caseSensitive', True)
    df.select('ABC').show()
     
    '''
    output:
    +---+
    |ABC|
    +---+
    |  1|
    +---+
    '''
     
    df.select('abc').show()
     
    '''
    output:
    +---+
    |abc|
    +---+
    |  2|
    +---+
    '''
     
    df.select('Abc').show()
     
    '''
    output:
    Traceback (most recent call last):
        ...
    pyspark.sql.utils.AnalysisException: "cannot resolve '`Abc`' given input columns: [ABC, abc];;\n'Project ['Abc]\n+- LogicalRDD [ABC#39L, abc#40L], false\n"
    '''
    반응형

    댓글

Designed by Tistory.