Y NOT — Explore Mongo DB, Schema-less Document Based No-SQL DataBase

Data are generally stored in the form of Files residing in a Folder Structure within Storage Devices. Storing & Retrieving Data (coined as I/O Operations technically) play a significant part in performance eventually determining the customer experience in one way or other.

Same Data if organized in a proper way (Data Modeling), we can widely improve the I/O Operations. There are two nature of handling data within a database (broader perspective),

  • SQL (Structured Query Language) — Structured way of maintaining data (which are stored onto tables with fixed columns with fixed data types).E.g., Oracle, SQLServer, PostgreSQL, MariaDB etc
  • No-SQL — Do not follow any structure, are non tabular, schema-less, document based database which has a flexibility on saving/writing data (like accommodating new fields/columns on the fly) eg. MongoDB, Redis, Cassandra, HBase, Neo4j etc.

Mongo DB — Document based No-SQL Database where each row/record of information is termed as an independent Document. Multiple documents of different structures are stored in Collection (Analogy to Table in SQL World but with lot more flexibilities). Each document stores/retrieves the data in JSON format (key value pairs). mongod (Mongo Daemon) - it’s basically the host process for the database. When you start mongod you’re basically saying “start the MongoDB process and run it in the background”. mongo is the command-line shell that connects to a specific instance of mongod

Basic Mongo DB Commands:

Open command prompt (as it is windows based), type mongo to open the mongo client (where the installed version, mongo server connection string would be displayed).

  • show dbs — list all databases available
  • show collections — list all collections available within current database
  • use “databaseName”— Switch to database named “databaseName”. If the database is not available, it will automatically create them.
  • db.createCollection(“collectionName”) — db keyword refers to the current database. This command is used to create a new collection named “collectionName” (if not available, it will automatically create them).
  • db.collectionName.insert({ “name“: “nagaraj” }) — Inserts a new document onto collectionName and this new document contains only name attribute.
  • db.collectionName.find() — Shows all the documents within the collectionName.
  • db.collectionName.find({ “name”: “nagaraj” }) — Finds the specific document with attribute name as nagaraj within collectionName.
  • db.collectionName.drop() — Drop/delete the collection
  • db.dropDatabase() — Drop the current database.
  • db.collectionName.find().pretty() — Pretty provides the formatted (human readable) JSON Response

Assuming customer collection has documents pertaining information like name, phone, city, age etc.

  • db.customers.find({ “age”: {$gt: 22} }) — filters the documents whose customer age is greater than 22
  • db.customers.find({ “name”: “Tom”}, {“city” : 0} ) — filters the documents whose name is “Tom” and exclude the column city from output json response
  • db.customers.find({ “name”: “Tom”}, {“city”: 1, “name”: 1}) — Filter the documents whose name is “Tom” and include only name & city attribute in output json response.
  • db.customers.update ({ “name”: “Tom” }, { $set: {“age”:25} }} — Filter the documents with name as “Tom” and updates the age attribute to 25. In case this specified attribute is not available, it will create the attribute on the fly
  • db.customers.deleteOne({ “name”: “Tom” }) — delete the first document with name as “Tom”.

Ways of Connecting to Mongod (Daemon Host Process):

  • Mongo CLI
  • GUI such as MongoDB Compass
  • API (which are used by Applications built on Spring boot, python etc)

To Integrate MongoDB (Mongod) server, client/program code needs to know the server connection string (holding IP & Port No Information)

mongodb:127.0.0.1:27017

In addition, for the API, it also needs the drivers (specific to each programming language) to talk to Mongo DB.

Bulk Upload of JSON Data to MongoDB Using Mongo Import:

We do have mongo db developer tool utility to upload JSON File (containing many documents) in a single shot.

  • First step is to install Developer Tools using https://www.mongodb.com/try/download/database-tools
  • mongoimport fileName.json -d DBName -c CollectionName --jsonArray — Command will bulk upload all the documents on the json file onto respective collection. If the database or collection name specified do not exist, it would get created automatically on the fly.

Document Data Model:

  • Embedded — Sub documents are nested/embedded into one parent document. This eliminates the need to process multiple documents (analogy — perform joins on multiple tables in case of SQL). This eventually improves the performance a significant way.
  • Reference — Dependent document’s unique ID would be tagged onto this document (instead of embedding entire document information here). This would be suited for specific uses where we are maintaining same information at multiple parents. Updating on a single document would be reflecting in all its parent’s document.

Indexing:

  • To optimize/improve search/retrieval of documents/records, Indexing is applied. This eventually improves I/O Operations.
  • Indexing is looked as a way to minimize execution time of the query.
  • To improve the search, we would arrange the data in order (sort once) and as a result, search/retrieval becomes faster.
  • By Default, searching a document within a collection falls under COLLSCAN (Collection Scanning). Whenever Index are applied, searching works based on IXSCAN (Index Scanning).
  • explain() command will return the status of the query after this command. For eg, assuming contacts as the collection which have documents of the contacts like name/age etc, executing “db.contacts.explain().find({ “name.first” : “carl” })” will give query planner information and server info information as below
  • explain(“executionStats”) will give more details of execution summary of the query like time taken, no of records scanned, no of records returned as output etc. db.contacts.explain(“executionStatus”).find({ “name.first” : “carl” })
  • createIndex() is used to create a new Index. For e.g., db.contacts.createIndex ({ “name.first” : 1 }) will create a new index on name → first attribute. Here 1 indicates the documents are arranged based on ascending order whereas -1 will arrange them in descending order.
  • getIndexes() — used to retrieve all indexes available for that collection. For e.g., db.contacts.getIndexes()
  • dropIndex() — used to delete/drop an existing index on that collection. For e.g., db.contacts.dropIndex ({ “name.first” : 1}). Here 1 indicates the documents are arranged based on ascending order whereas -1 will arrange them in descending order.

Aggregation (Grouping) Framework:

MongoDB has a separate aggregation framework that follows a pipeline (stage by stage execution).

An requirement to list the no of female contacts at each location in descending order can be achieved by using below stages

  • Stage 1 — Filter only Female Contacts
  • Stage 2 — Group based on location, On each location getting matched, perform sum of 1 with current value to extract the count of females on that location
  • Stage 3 — Sort the result in Descending Order based on the Female Count

db.contacts.aggregate([
{ $match: { gender: “female” } },
{ $group: { _id: {state: “$location.state”}, totalFemale: { $sum: 1 }}},
{ $sort: {totalFemale: -1} }
]
)

Aggregate has commands like Match/Group/Sort etc using which different stages can be sequenced on a pipeline.

Software Craftsman who loves to explore and evolve