-
[Spark] The case sensitive option of sparkOpen Source/Spark 2020. 10. 21. 12:56반응형
Used spark version: 2.4.4
The column of spark DataFrame can be selected regardless of case.
Because the spark uses case-sensitive option (spark.sql.caseSensitive : default is False).
1. Example of case-sensitive option is False (default)
- You can select columns ignoring the case of column name.
df = spark.createDataFrame([{'ABC': 1, 'abc': 2}]) df.show() ''' output: +---+---+ |ABC|abc| +---+---+ | 1| 2| +---+---+ ''' df.select('ABC').show() ''' output: Traceback (most recent call last): ... pyspark.sql.utils.AnalysisException: "Reference 'ABC' is ambiguous, could be: ABC, ABC.;" '''
2. Example of case-sensitive option is True
- You can select exactly the column name of DataFrame.
df = spark.createDataFrame([{'ABC': 1, 'abc': 2}]) df.show() ''' output: +---+---+ |ABC|abc| +---+---+ | 1| 2| +---+---+ ''' spark.conf.set('spark.sql.caseSensitive', True) df.select('ABC').show() ''' output: +---+ |ABC| +---+ | 1| +---+ ''' df.select('abc').show() ''' output: +---+ |abc| +---+ | 2| +---+ ''' df.select('Abc').show() ''' output: Traceback (most recent call last): ... pyspark.sql.utils.AnalysisException: "cannot resolve '`Abc`' given input columns: [ABC, abc];;\n'Project ['Abc]\n+- LogicalRDD [ABC#39L, abc#40L], false\n" '''
반응형'Open Source > Spark' 카테고리의 다른 글
Spark Cluster 구축 (Standalone Mode) (0) 2021.09.14 [Spark] Altibase 데이터베이스 연동 (0) 2020.10.16 [Spark] 컬럼명 대소문자 구분 (case-sensitive) 옵션 (2) 2020.10.14