Hive Interview Questions and Answers, Hive Interview Questions and Answers Freshers, Hive Interview Questions and Answers, Hive Interview Questions
Before getting on to the Hive interview questions, the student must know that the Hive is a continuously varying field which needs the students as well as professionals to upgrade their skills with the new features and knowledge, to get fit for the jobs associated with Hive. This post related to Hive Interview Questions and Answers, Hive Interview Questions and Answers Freshers, Hive Interview Questions and Answers, Hive Interview Questions will help you let out find all the solutions that are frequently asked in you upcoming Hive interview.
Over thousands of vacancies available for the Hive developers, experts must be acquaintance with all the component of Hive technologies. This is necessary for the students in order to have in-depth knowledge of the subject so that they can have best employment opportunities in the future. Knowing every little detail about Hive is the best approach to solve the problems linked with problem.
APTRON has spent hours and hours in researching about the Hive Interview Questions and Answers, Hive Interview Questions and Answers Freshers, Hive Interview Questions and Answers, Hive Interview Questions that you might encounter in your upcoming interview. All these questions will alone help you to crack the interview and make you the best among all your competitors.
First of all, let us tell you about how the Hive technology is evolving in today’s world and how demanding it is in the upcoming years. In fact, according to one study, most of the companies and businesses have moved to the Hive. Now, you cannot predict how huge the future is going to be for the people experienced in the related technologies.
Hence, if you are looking for boosting up your profile and securing your future, Hive will help you in reaching the zenith of your career. Apart from this, you would also have a lot of opportunities as a fresher.
These questions alone are omnipotent. Read and re-read the questions and their solutions to get accustomed to what you will be asked in the interview. These Hive interview questions and answers will also help you on your way to mastering the skills and will take you to the giant world where worldwide and local businesses, huge or medium, are picking up the best and quality Hive professionals.
This ultimate list of best Hive interview questions will ride you through the quick knowledge of the subject and topics like Architecture, Create Database, DROP Database, CREATE Table, DROP Table. This Hive interview questions and answers can be your next gateway to your next job as a Hive expert.
These are very Basic Hive Interview Questions and Answers for freshers and experienced both.
Q1: What kind of applications is supported by Apache Hive?
A1: Hive supports all those client applications that are written in Java, PHP, Python, C++ or Ruby by exposing its Thrift server.
Q2: Define the difference between Hive and HBase?
A2: The key differences between Apache Hive and HBase are as follows:
- The Hive is a data warehousing infrastructure whereas HBase is a NoSQL database on top of Hadoop.
- Apache Hive queries are executed as MapReduce jobs internally whereas HBase operations run in a real-time on its database rather than MapReduce.
Q3: Where does the data of a Hive table gets stored?
A3: By default, the Hive table is stored in an HDFS directory – /user/hive/warehouse. One can change it by specifying the desired directory in hive.metastore.warehouse.dir configuration parameter present in the hive-site.xml.
Q4: What is a metastore in Hive?
A4: Metastore in Hive stores the meta data information using RDBMS and an open source ORM (Object Relational Model) layer called Data Nucleus which converts the object representation into relational schema and vice versa.
Q5: Why Hive does not store metadata information in HDFS?
A5: Hive stores metadata information in the metastore using RDBMS instead of HDFS. The reason for choosing RDBMS is to achieve low latency as HDFS read/write operations are time consuming processes.
Q6: What is the difference between local and remote metastore?
A6: Local Metastore:
In local metastore configuration, the metastore service runs in the same JVM in which the Hive service is running and connects to a database running in a separate JVM, either on the same machine or on a remote machine.
Remote Metastore:
In the remote metastore configuration, the metastore service runs on its own separate JVM and not in the Hive service JVM. Other processes communicate with the metastore server using Thrift Network APIs. You can have one or more metastore servers in this case to provide more availability.
Q7: What is the default database provided by Apache Hive for metastore?
A7: By default, Hive provides an embedded Derby database instance backed by the local disk for the metastore. This is called the embedded metastore configuration.
Q8: Is it possible to change the default location of a managed table?
A8: Yes, it is possible to change the default location of a managed table. It can be achieved by using the clause – LOCATION ‘’.
Q9: When should we use SORT BY instead of ORDER BY?
A9: We should use SORT BY instead of ORDER BY when we have to sort huge datasets because SORT BY clause sorts the data using multiple reducers whereas ORDER BY sorts all of the data together using a single reducer. Therefore, using ORDER BY against a large number of inputs will take a lot of time to execute.
Q10: What is a partition in Hive?
A10: Hive organizes tables into partitions for grouping similar type of data together based on a column or partition key. Each Table can have one or more partition keys to identify a particular partition. Physically, a partition is nothing but a sub-directory in the table directory.
Q11: Why do we perform partitioning in Hive?
A11: Partitioning provides granularity in a Hive table and therefore, reduces the query latency by scanning onlyrelevant partitioned data instead of the whole data set.
For example, we can partition a transaction log of an e – commerce website based on month like Jan, February, etc. So, any analytics regarding a particular month, say Jan, will have to scan the Jan partition (sub – directory) only instead of the whole table data.
Q12: What is dynamic partitioning and when is it used?
A12: In dynamic partitioning values for partition columns are known in the runtime, i.e. It is known during loading of the data into a Hive table.
One may use dynamic partition in following two cases:
- Loading data from an existing non-partitioned table to improve the sampling and therefore, decrease the query latency.
- When one does not know all the values of the partitions before hand and therefore, finding these partition values manually from a huge data sets is a tedious task.
Q13: How to change the column data type in Hive? Explain RLIKE in Hive.
A13: We can change the column data type by using ALTER and CHANGE.
The syntax is :
ALTER TABLE table_name CHANGE column_namecolumn_namenew_datatype;
Example: If we want to change the data type of the salary column from integer to bigint in the employee table.
ALTER TABLE employee CHANGE salary salary BIGINT;RLIKE: Its full form is Right-Like and it is a special function in the Hive. It helps to examine the two substrings. i.e, if the substring of A matches with B then it evaluates to true.
Example:
‘Intellipaat’ RLIKE ‘tell’ True
‘Intellipaat’ RLIKE ‘^I.*’ True (this is a regular expression)
Q14: What are the components used in Hive query processor?
A14: The components of a Hive query processor include
Logical Plan of Generation.
Physical Plan of Generation.
Execution Engine.
Operators.
UDF’s and UDAF’s.
Optimizer.
Parser.
Semantic Analyzer.
Type Checking
Q15: What is Buckets in Hive?
A15: The present data is partitioned and divided into different Buckets. This data is divided on the basis of Hash of the particular table columns.
Q16: Explain process to access sub directories recursively in Hive queries.
A16: By using below commands we can access sub directories recursively in Hive
hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;
Hive tables can be pointed to the higher level directory and this is suitable for the directory structure which is like /data/country/state/city/
Q17: How to skip header rows from a table in Hive?
A17: Header records in log files
System=….
Version=…
Sub-version=….
In the above three lines of headers that we do not want to include in our Hive query. To skip header lines from our tables in the Hive,set a table property that will allow us to skip the header lines.
CREATE EXTERNAL TABLE employee (
name STRING,
job STRING,
dob STRING,
id INT,
salary INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ STORED AS TEXTFILE
LOCATION ‘/user/data’
TBLPROPERTIES(“skip.header.line.count”=”2”);
Q18: What is the maximum size of string data type supported by hive? Mention the Hive support binary formats.
A18: The maximum size of string data type supported by hive is 2 GB.
Hive supports the text file format by default and it supports the binary format Sequence files, ORC files, Avro Data files, Parquet files.
Sequence files: Splittable, compressible and row oriented are the general binary format.
ORC files: Full form of ORC is optimized row columnar format files. It is a Record columnar file and column oriented storage file. It divides the table in row split. In each split stores that value of the first row in the first column and followed sub subsequently.
AVRO data files: It is same as a sequence file splittable, compressible and row oriented, but except the support of schema evolution and multilingual binding support.
Q19: What is the precedence order of HIVE configuration?
A19: We are using a precedence hierarchy for setting the properties
SET Command in HIVE
The command line –hiveconf option
Hive-site.XML
Hive-default.xml
Hadoop-site.xml
Hadoop-default.xml
Q20: If you run a select * query in Hive, Why does it not run MapReduce?
A20: The hive.fetch.task.conversion property of Hive lowers the latency of mapreduce overhead and in effect when executing queries like SELECT, FILTER, LIMIT, etc., it skips mapreduce function
Q21: How Hive can improve performance with ORC format tables?
A21: We can store the hive data in highly efficient manner in the Optimized Row Columnar file format. It can simplify many Hive file format limitations. We can improve the performance by using ORC files while reading, writing and processing the data.
Set hive.compute.query.using.stats-true;
Set hive.stats.dbclass-fs;
CREATE TABLE orc_table (
idint,
name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\:’
LINES TERMINATED BY ‘\n’
STORES AS ORC;
Need a reason to learn Apache Hadoop and Hive? Well, go through this blog post to find out why Hadoop is the new black.
Q22: Explain the functionality of Object-Inspector.
A22: It helps to analyze the internal structure of row object and individual structure of columns in HIVE. It also provides a uniform way to access complex objects that can be stored in multiple formats in the memory.
Instance of Java class
A standard Java object
A lazily initialized object
The Object-Inspector tells structure of the object and also ways to access the internal fields inside the object.
Q23: Whenever we run hive query, new metastore_db is created. Why?
A23: Local metastore is created when we run Hive in embedded mode. And before creating it checks whether the metastore exists or not and this metastore property is defined in the configuration file hive-site.xml. Property is“javax.jdo.option.ConnectionURL” with default value “jdbc:derby:;databaseName=metastore_db;create=true”.So to change the behavior of the location to an absolute path, so that from that location meta-store will be used.
Q24: Differentiate between Hive and HBase
A24:
Hive | HBase |
Enables most of the SQL queries | This doesn’t allow SQL queries |
Doesn’t support record level insert, update, and delete operations on table | It supports |
It is a data warehouse framework | It is NoSQL database |
Hive run on the top of MapReduce | HBase runs on the top of HDFS |
Q25: How can we access the sub directories recursively?
A25: By using below commands we can access sub directories recursively in Hive
hive> Set mapred.input.dir.recursive=true;
hive> Set hive.mapred.supports.subdirectories=true;
Hive tables can be pointed to the higher level directory and this is suitable for the directory structure which is like /data/country/state/city/
Q26: What are the uses of explode Hive?
A26: Hadoop developers consider the array as their inputs and convert them into a separate table row. To convert complicate data types into desired table formats Hive is essentially using explode.
Q27: What is available mechanism for connecting from applications, when we run hive as a server?
A27: Thrift Client: Using thrift you can call hive commands from various programming languages. Example: C++, PHP,Java, Python and Ruby.
JDBC Driver: JDBC Driver supports the Type 4 (pure Java) JDBC Driver
ODBC Driver: ODBC Driver supports the ODBC protocol.
Q28: How do we write our own custom SerDe?
A28: End users want to read their own data format instead of writing, so the user wants to write a Deserializer than SerDe.
Example: The RegexDeserializer will deserialize the data using the configuration parameter ‘regex’, and a list of column names.
If our SerDe supports DDL, we probably want to implement a protocol based on DynamicSerDe. It’s non-trivial to write a “thrift DDL” parser.
Q29. Mention the date data type in Hive. Name the Hive data type collection.
A29: The TIMESTAMP data type stores date in java.sql.timestamp format.
Three collection data types in Hive
ARRAY
MAP
STRUCT
Q30: Can we run UNIX shell commands from Hive? Can Hive queries be executed from script files? How? Give an example.
A30: Yes, we can run UNIX shell commands from Hive using the! Mark before the command .For example: !pwd at hive prompt will list the current directory.
We can execute Hive queries from the script files by using the source command.
Example −
Hive> source /path/to/file/file_w
Hive Conclusion Interview FAQs
We know the list of Hive Interview Questions and Answers, Hive Interview Questions and Answers Freshers, Hive Interview Questions and Answers, Hive Interview Questions is overwhelming but the advantages of reading all the questions will maximize your potential and help you crack the interview. The surprising fact is that this Hive interview questions and answers post covers all the basic of the Hive technology and you have to check out the FAQs of different components of Hive too.
However, you will be asked with the questions in the interview related to the above mentioned questions. Preparing and understanding all the concept of Hive technology will help you strengthen the other little information around the topic.
After preparing these interview questions, we recommend you to go for a mock interview before facing the real one. You can take the help of your friend or a Hive expert to find the loop holes in your skills and knowledge. Moreover, this will also allow you in practicing and improving the communication skill which plays a vital role in getting placed and grabbing high salaries.
Remember, in the interview, the company or the business or you can say the examiner often checks your basic knowledge of the subject. If your basics is covered and strengthened, you can have the job of your dream. The industry experts understand that if the foundation of the student is already made up, it is easy for the company to educate the employ towards advance skills. If there are no basics, there is no meaning of having learnt the subject.
Therefore, it’s never too late to edge all the basics of any technology. If you think that you’ve not acquired the enough skills, you can join our upcoming batch of Hive Training in Noida. We are one of the best institute for Hive in noida which provide advance learning in the field of Hive Course. We’ve highly qualified professionals working with us and promise top quality education to the students.
We hope that you enjoyed reading Hive Interview Questions and Answers, Hive Interview Questions and Answers Freshers, Hive Interview Questions and Answers, Hive Interview Questions and all the FAQs associated with the interview. Do not forget to revise all the Hive interview questions and answers before going for the Hive interview. In addition to this, if you’ve any doubt or query associated with Hive, you can contact us anytime. We will be happy to help you out at our earliest convenience. At last, we wish you all the best for your upcoming interview on Hive Technology.