Data Science Interview Preparation

Spark SQL and Machine Learning - Advanced

Spark SQL and Machine Learning - Advanced

Spark SQL and Machine Learning - Advanced

1. Which optimizer in Spark SQL is rule-based and designed to improve query performance?

Catalyst Optimizer
Tungsten Optimizer
Pluto Optimizer
Athena Optimizer

2. Which algorithm in Spark MLlib is used for collaborative filtering?

K-Means
Alternating Least Squares (ALS)
Decision Trees
Naive Bayes

3. Which function in Spark SQL is used to unpersist cached tables from memory?

clearCache()
removeCache()
deleteCache()
unpersistTable()

4. In Spark MLlib, which method is used to evaluate the performance of a regression model?

score()
predict()
evaluate()
assess()

5. Which algorithm in Spark MLlib is used for frequent pattern mining?

FP-Growth
Apriori
K-Means
DBSCAN

6. Which feature in Spark MLlib is used for feature selection and dimensionality reduction?

StringIndexer
VectorAssembler
PCA
OneHotEncoder

7. Which function in Spark SQL is used to specify the partitioning columns when writing a DataFrame to a table?

repartition()
distributeBy()
sortBy()
partitionBy()

8. Which function in Spark MLlib is used to perform hyperparameter tuning for machine learning models?

CrossValidator
GridSearchCV
RandomizedSearchCV
HyperparameterOptimizer

9. Which algorithm in Spark MLlib is used for outlier detection?

Local Outlier Factor (LOF)
One-Class SVM
Isolation Forest
DBSCAN

10. Which Spark MLlib feature transformer is used for converting categorical features into numerical features?

StringIndexer
VectorAssembler
PCA
OneHotEncoder
'; (function() { var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true; dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js'; (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq); })();
Theme images by Barcin. Powered by Blogger.