spark2.0.1 cannot write /tmp/hive folder

Description

I have three nodes, one has metaserver and spark master, the second one is chunkserver, the third one is spark slave.
All these nodes has Linux qfs account, same uid and gid:

qfs:x:502:503::/home/qfs:/bin/bash

qfs servers and spark severs are running under qfs account to keep things simple.

The metaserver config file(MetaServer.prp) has following:

metaServer.rootDirUser = 502
metaServer.rootDirGroup = 503
metaServer.rootDirMode = 0755

metaServer.defaultLoadUser = 502
metaServer.defaultLoadGroup = 503
metaServer.defaultLoadFileMode = 0644
metaServer.defaultLoadDirMode = 0755

I can run qfsshell on node 3 under qfs account and create a dir successfully, but when running spark-shell on node 3, still under qfs account, I ran into the problem.

Interesting thing is, the /tmp/hive folder is created when I run my scala script in spark-shell, but cannot write to this folder then.

How to fix this? Any log can help me to diagnose this kind of problem?

scala> val df = spark.read.option("delimiter","\t").csv("/data/files/gsm*");
java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx-wx--x
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:189)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:359)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:263)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive$lzycompute(HiveSharedState.scala:39)
at org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog$lzycompute(HiveSharedState.scala:46)
at org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)
at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:50)
at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:413)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:349)
... 48 elided
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx-wx--x
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 71 more

scala> :q

Activity

Show:
Kevin Stinson
November 22, 2016, 5:37 PM
Dean Chen
October 18, 2016, 10:51 AM

just for remind, I re-open this issue and change its type to improvement

Kevin Stinson
October 18, 2016, 1:16 AM

QFS uses the process's umask when creating files and directory.

HDFS uses a Configuration property to define the umask used for these operations. When Hive creates the /tmp/hive directory, it temporarily overrides the value of the property so it can set the permissions to wants.

Since QFS ignores this property, the permissions are not set to what Hive expects.

I will look at changing QFS to honor the umask property.

In the meantime, as a workaround, you can pre-create the directory with the correct permissions using qfsshell.

Dean Chen
October 17, 2016, 12:48 PM

after use chmod command correctly in qfsshell, it has been fixed.

use chmod 0777 instead of 777 on /tmp/hive folder

Done

Assignee

Kevin Stinson

Reporter

Dean Chen

Labels

Components

Fix versions

Affects versions

Priority

Highest