Introduction
Carl Strozz coined the concept of NoSQL in 1998. NoSQL refers to a non-SQL or non-relational Data Management System which provides a mechanism for retrieving and storing data. The main reason behind the popularity of NoSQL is its capability to store and handle structured, semi-structured, unstructured, and polymorphic data. NoSQL is hugely popular in Big data and real-time web apps, which is increasing firmly. For example, companies like Google, Twitter, and Facebook collect terabytes of user data daily. Let’s take a look at some of the interview questions on NoSQL.
Learning Objectives
The learning objectives of this blog include the following:
1. A common understanding of NoSQL, its features, and how it is better than relational databases.
2. Knowledge of CAP theorem, scalability, and normalization in NoSQL.
3. Understanding Big SQL, Impala, and Polyglot persistence in NoSQL.
This article was published as a part of the Data Science Blogathon.
Table of contents
Multiple-Choice Interview Questions
Q1. In the variety of NoSQL databases, choose which is the simplest one.
- Key-value
- Wide-column
- Document
- All of the above
Answer: Key-value
Explanation: It is considered the simplest NoSQL database as it stores all items as an attribute name (i.e., “key”) and its corresponding value that is easy to fetch.
Q2. “Sharding” a database across many server instances can be gained with _______________.
- LAN
- SAN
- MAN
- All of the above
Answer: SAN
Explanation: In order to make hardware act as a single server we can easily achieve SAN and other complex arrangements.
Q3. Choose the correct example of a wide-column store.
- Cassandra
- Riak
- MongoDB
- Redits
Answer: Cassandra
Explanation: Databases like Cassandra and HBase are efficient to deal with queries over huge volumes of data and store that in the form of columns instead of rows.
Q4. Which options are used to distribute different data across multiple servers?
- Partitioning
- Bucketing
- Sharding
- None of the above
Answer: Sharding
Explanation: Sharding is a popular process where a shard key is used to split the data into ranges and distribute it across various shards.
Q5. Having multiple machines for storing files, MongoDB can be used as a ____________, to take advantage of data replication and load-balancing features.
- AMS
- CMS
- File System
- None of the above
Answer: AMS
Explanation: AMS can help by load balancing features and facilitates data replication.
Q6. Choose the correct type of NoSQL database.
- SQL
- Document Databases
- JSON
- All of the above
Answer: Document Databases
Explanation: In Document databases, the data is stored as a key value where every key pairs with a complex data structure(document).
Q7. MongoDB, a popular type of NoSQL, is used by many firms as ________ software to build websites and offer services.
- Frontend
- Backend
- Proprietary
- All of the above
Answer: Backend
Explanation: MongoDB is the most popular NoSQL database, it stores data as a backend and helps frontend systems work.
Q8. In NoSQL, we can use _____________ to process batch data and perform aggregation operations.
- Hive
- MapReduce
- Oozie
- None of the above
Answer: MapReduce
Explanation: MapReduce combines a mapper and reducer, which facilitates users with an aggregation framework.
Q9. MongoDB does not support which of the following listed sorting techniques?
- Collation
- Collection
- Heap
- None of the above
Answer: Collation
Explanation: The row key in Google Bigtable can’t use frequently updated identifiers as a data type to store data efficiently.
Q10. In MongoDB, the Dynamic schema feature is used to make ____________ easier for applications.
- Inheritance
- Polymorphism
- Encapsulation
- None of the above
Answer: Polymorphism
Explanation: Using a dynamic scheme, MongoDB can provide the schemas before you can add data to them.
Q11. Choose the correct option, which is not a feature for NoSQL databases.
- Relational Data
- Scalability
- Across multiple servers, data can be easily held
- Faster data access than SQL databases
Answer: Relational Data
Explanation: NoSQL databases are popular for storing unstructured data, whereas Relational Data is highly structured.
Q12. Among the following statements, choose the right one with respect to mongoDB.
- MongoDB is a NoSQL Database
- For data exchange, MongoDB prefers XML over JSON
- MongoDB isn’t scalable
- All of the above
Answer: MongoDB is a NoSQL Database
Explanation: MongoDB is a highly scalable database that prefers JSON files for data exchange.
Q13. The I’d field generated by the system is a__________.
- 12-byte hexadecimal value
- 16-byte octal value
- 12-byte decimal value
- 10-byte binary value
Answer: 12-byte Hexadecimal Value
Explanation: The default value for the _id field is a 12-byte hexadecimal value.
Q14. Best suited NoSQL to build a database for a Load Frequency Control system where the data stored is mainly the same manner is?
- Relational
- NoSQL
- Both A and B can be used
- None of the above
Answer: NoSQL
Explanation: NoSQL can store data efficiently and speed up the LFC system.
Q15. In the same collection, documents do not require the same structure or fields, and common areas in a collection’s documents may hold various types of data known as?
- Dynamic Schema
- MongoDB
- Mongo
- Embedded Documents
Answer: Dynamic Schema
Explanation: Dynamic schema means that documents in the same collection do not need the same fields or structure. Common areas in a collection’s documents may hold different types of data.
Q16. Choose the most suitable size for Redis keys.
- Medium
- Short
- Single Bit
- Long
Answer: Short
Explanation: Redis keys are more minor in size, and short data types can easily store these keys.
Q17. Which data types should you avoid when designing a Google Bigtable row key?
- Multi-valued identifiers
- String identifiers
- Timestamps
- Frequently updated identifiers
Answer: Frequently Updated Identifiers
Explanation: The row key in Google Bigtable can’t use frequently updated identifiers as a data type to store data efficiently.
Detailed Interview Questions
Q1. Explain the concept of NoSQL databases.
NoSQL stands for “Not Only SQL, ” a database designed to handle a massive amount of unstructured data, semi-structured data, and relational data. It existed when other traditional databases failed to provide seamless data services and proved highly scalable and flexible for handling big data produced in the real world. This allows MNCs, like Google and Facebook, to deliver cloud-based services to store data in real time.
Q2. How can you track data record relations in NoSQL?
To track data records in NoSQL, below are the steps:
A. First, we must embed all stored data in a user object.
B. Then, we can create the user id credentials to log in with that.
C. After using login credentials, we can give comments value with a list of comments; this will display the result.
Q3. Illustrate the various features of NoSQL.
Below are some essential features of NoSQL:
- Storage: NoSQL enables high storage capabilities to store structured, semi-structured, and unstructured data. NoSQL is a schema-free database that enables storing heterogeneous data in a single domain.
- Project management: Agile is used to deliver a workable project. It supports agile sprint and quick iteration, suitable for project management.
- Object Oriented: NoSQL is based upon object-oriented programming, which is easy to use and best suited for web applications.
- Cost: NoSQL supports the scale-out architecture, which is cost-effective and efficient.
Q4: Explain the concept of the aggregate-oriented database.
Aggregate-oriented databases play a significant role in reducing the computation and managing the storage over the cluster. As the name suggests, aggregate databases are data collections that interact with other data as a single unit with the help of key-value properties and ACID operations.
Q5. How can we differentiate NoSQL and traditional RDBMS?
Both NoSQL and relational database systems (RDBMS) are used to store the data, but they are different in the following ways:
- Storage mechanism: NoSQL can store semi-structured and unstructured data in key-value pair, column, or graph format, while RDBMS can only store structured data in tables.
- Data format: There is no predefined data format in NoSQL as it is very flexible in terms of data storage, while RDBMS can store well-organized structured data only.
- Scalability: NoSQL is a highly scalable and flexible database compared to RDBMS.
- Querying: Due to the unavailability of joins, querying data in NoSQL is minimal, while RDBMS is rich due to structured query language(SQL) usage.
Q6. Is the concept of normalization used in NoSQL?
Yes, the concept of normalization is used in NoSQL to prevent data redundancy and losses. In NoSQL, Apache Cassandra is a famous normalization-based database that stores data in a series of tables depending upon the fields.
Q7. What are the different types of NoSQL databases available?
Below are the types of NoSQL databases:
- Key Value Pair Database: In this type of NoSQL, keys are used to access the various values.
Examples: Redis, Riak, and Oracle NoSQL.
- Document-Oriented Database: NoSQL is preferred when storing hierarchical data structures straight in the database.
For Example: ArangoDB, CosmoDB, and MongoDB.
- Graph Database: Graph enables the storage of relationship-intensive data.
Examples: Neo4j, Oracle NoSQL, and Graph Base.
- Column-Oriented Database: It acts as a sparse matrix system and uses columns as keys.
Some of the examples are: Apache Cassandra, ScyllaDB, and Microsoft Azure Cosmos DB.
Q8. What is the CAP theorem in NoSQL?
Eric Brewer proposed the CAP theorem in early 2000, which acts as the three most reliable guarantees for a database. The CAP stands for:
- Consistency: It ensures that every node sees the exact same data at the same time.
- Availability: It ensures that every request will be considered and guarantees a response for that.
- Partition Tolerance: It ensures that the system won’t stop even if there is a failure in its parts.
Q9. Is it possible to use NoSQL in an Oracle-based database?
Yes, using NoSQL in an Oracle-based database to record data is possible. With the help of the external table function, records in the NoSQL database can be retrieved or queried by the Oracle database.
Q10. What is “Polyglot Persistence” in NoSQL?
The idea behind the term Polyglot Persistence is to write an application in mixed languages so that one can handle a particular problem in the correct language rather than trying to solve multiple issues in a single language. This concept is used while storing the data in NoSQL. To create a safer type of data storage system, developers choose multiple data storage systems to store various data and protect the single data storage systems. Hence, polyglot persistence is nothing but the use of multiple data storage technologies to handle different types of data storage needs.
Q11. What do you understand about the term Big SQL in NoSQL?
IBM developed Big SQL, a fast-performing database used to store enterprise data. Big SQL supports MPP( Massive parallel processing) to securely handle large amounts of data.
Q12. What is the importance of impala in the NoSQL database?
Impala is famous for its ability to perform low-latency queries. Impala offers parallel processing in database technology just after the successful handling of big data by the administrator. The use of parallel processing decreases the fetching time and enhances the system’s performance.
Q13. What type of data can we manage in NoSQL?
NoSQL has a flexible data model for managing semi-structured and unstructured data easily.
Q14. How to decide when a NoSQL database is preferable over RDBMS?
NoSQL database is preferable in the following situations:
- When the data to be stored is semi-structured or unstructured.
- If we need to store data in key-value format with massive high-speed performance.
- When we need to perform multiple JOIN queries.
- When the client’s demand is a high-traffic site.
Q15. What does BASE stand for in NoSQL?
In NoSQL, the terminology BASE stands for:
- Basically Available
- Soft State
- Eventually Consistent
Q16. What is scaling in a database, and how can we scale a database?
Scaling is nothing but the ability to increase the capacity of a database system to store a huge amount of data without affecting data performance.
Databases can be scaled either:
Vertically: Vertical scaling is the process of increasing the hardware capacity(e.g., CPU, RAM) by inserting more resources into existing machines, which helps to enhance the server’s processing power.
OR
Horizontally: Horizontal scaling enhances the database capacity by increasing the number of servers, distributing data, and adding more machines.
Q17. What is the meaning of Denormalization?
Denormalization is not a reverse of normalization. Instead, it is a data optimization technique applied after normalization. Denormalization adds redundant data to multiple tables and helps us to ignore the expensive joins in a relational database.
Q18. What are the limitations of the NoSQL database?
Below are some limitations or disadvantages of the NoSQL database:-
- Security is the first and most critical aspect of looking for different technologies. Although data security cannot be compromised in any situation, NoSQL is still progressing to provide better security.
- Scalability: NoSQL is undoubtedly much more scalable than SQL, but it still needs to provide complete scalability. For example, many NoSQL databases do not provide automatic sharding, which implies spreading a database across various nodes. So, how can we expect to scale up/down automatically if the database can’t share automatically?
- Risk of Data Consistency: ACID transactions are the most trusting technique to ensure that data remains consistent in the entire database, but most NoSQL databases do not support ACID transactions. Despite that, NoSQL follows the concept of “eventual consistency,” which enhances the performance but does not ensure 100% data consistency.
Conclusion
This blog covers most of the frequently asked interview questions on NoSQL for freshers that could be asked in data science, Data Analyst, and big data developer interviews. Using these interview questions as a reference, you can better understand the concept of NoSQL and start formulating practical answers for upcoming interviews. The key takeaways from this NoSQL blog are:
- NoSQL is hugely popular in Big data and real-time web apps, which is increasing firmly.
- NoSQL allows MNCs, like Google and Facebook, to deliver cloud-based services to store data in real-time.
- Use Polyglot Persistence to write an application in mixed languages so that one can handle a particular problem in the correct language.
- With the help of scaling, one can increase the capacity of a database system to store a vast amount of data without affecting data performance.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.