Data Science Interview Preparation

Spark SQL and Machine Learning - Beginner

Spark SQL and Machine Learning - Beginner

Spark SQL and Machine Learning - Beginner

1. Which API in Spark is used for executing SQL queries?

SparkContext
SparkDriver
SparkSession
SparkExecutor

2. Which of the following statements is true about DataFrame in Spark?

DataFrame is immutable.
DataFrame represents a distributed collection of data organized into named columns.
DataFrame is only used for structured data processing.
DataFrame can only be created from RDDs.

3. Which method is used to load a CSV file into a DataFrame in PySpark?

spark.loadCSV()
spark.read.csv()
spark.loadFile()
spark.read.text()

4. Which function is used to display the schema of a DataFrame in PySpark?

showSchema()
displaySchema()
viewSchema()
printSchema()

5. Which function is used to register a DataFrame as a temporary table in PySpark?

createOrReplaceTempView()
registerTempTable()
createTempTable()
tempView()

6. Which machine learning library is integrated with Spark for building ML pipelines?

scikit-learn
MLlib
TensorFlow
Keras

7. Which function is used to split a DataFrame into training and testing sets in PySpark?

randomSplit()
trainTestSplit()
split()
divide()

8. Which of the following is NOT a supervised learning algorithm available in Spark MLlib?

Decision Trees
Logistic Regression
K-Means
Random Forest

9. Which method is used to train a machine learning model in Spark MLlib?

train()
trainModel()
fitModel()
fit()

10. Which method is used to make predictions using a trained model in Spark MLlib?

predict()
transform()
predictModel()
makePredictions()
'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })();
Theme images by Barcin. Powered by Blogger.