Contents
1.1 Connections Per Host
Number of physical connections that a
mongo client can establish with the mongod process.
By
default: 100
But can be increased/decreased based
on our application requirement. It’s a good practice to identify the maximum
number of connections to be available at the average load in your application -
without keeping any threads waiting for available connections. The correct
value will improve the performance and will help you to properly manage the
resources (since each connection uses a certain RAM).
1.2 Connection
Time Out
Number of milliseconds a driver will
wait before a connection attempt is failed.
By
default: 10*1000 milli seconds
In normal scenarios the driver will be
able to make connection to the Mongod instance within a fraction of a second.
1.3 Threads
Allowed To Block For Connection Multiplier
Multiplier for connectionsPerHost that
denotes the number of threads that are allowed to wait for connections to
become available if the pool is currently exhausted. For example, if the
connectionsPerHost is 100 (default) and this value is 5, then up to 500 threads
can block before an exception is thrown.
NOTE:
By
setting the connectionsPerHost
to a correct value, we can reduce the number of threads waiting for an
available connection.
1.4 Max Wait Time
Number of milliseconds a thread can
wait for a connection to get available in the connection pool if the pool is
currently exhausted. And raise an exception if this does not happen in time.
By
default: 1000 * 60 * 2 ms
1.5 Write Concern
Based on this Mongo will decide
whether to raise error or not. And controls the acknowledgment of write
operations with various options.
By
default: WriteConcern.ACKNOWLEDGED
In this option, the write operation
will wait for acknowledgement from the primary server before returning. Will
raise error/exception on network failures and server errors.
Probably it’s a good idea to set it to
WriteConcern.JOURNALED – will wait
for the server to group commit to the journal file on disk. – use this if we
are worried about the durability. It will make sure none of your data (write
operation data) is lost even if the mongod terminated due to a failure before
writing to data files.
If we are not worried about the
durability and only concerned about the failures- use WriteConcern.SAFE - will wait for acknowledgement from the primary
server before returning. And it will raise error/exception on network as well
as server failures.
Note: will have a little more
performance impact compared to ACKNOWLEDGED since it sends a getLastError()
command after your write operation – and until the lastError command is
completed the connection is reserved.
Preferred: WriteConcern.JOURNALED
NOTE: Starting with MongoDB 2.6 write operations
will fail with an exception if this option is used when the server is running
without journaling. If you are not worried about the outcome of the write
operation, you can use writeConcern.UNACKNOWLEDGED (write operations return
after it’s written to the socket – raise exception only on network failure) or
WriteConcern.NONE (No exceptions are raised, even for network issues.)
1.6 Read
Preference
Represents the preferred replica set
members/nodes to which a query or command can be send.
It has different options:
Default: ReadPreference.primary() - all reads
goes to the Primary member/node in the replica set.
Note:
use this if you want all reads to return consistent/ the most recently written
data always.
ReadPreference.primaryPreferred() – All
reads goes to primary member if possible but may query secondary members if
primary is not available.
ReadPreference.Seconday() – all
reads go to secondary members in the replica set and Primary member/node will
be used for writes only. The reads become eventually consistent, because of the
possible replication latency. More secondary nodes can be added to scale up the
read performance, but there is a limit in the number of secondary nodes a
replica set can have.
ReadPreference.SecondaryPreferred() - All
reads go to Secondary nodes if any of them are available, if not then the reads
will be routed to the Primary member of the replica set.
ReadPreference.nearest() – all
reads go to the nearest replica set node/member to the client/application. Use
only if eventually consistent reads are acceptable.
Preferred: If your
requirement (for instance: ETL or analytical or Reporting) allows you for
eventually consistent reads, then use ReadPreference.SecondaryPreferred()
otherwise always use the default setting - ReadPreference.primary()
1.7 Set Mongo
configurations using Spring Data
@Configuration
@PropertySource(value= "classpath:/mongo.properties") @Profile({ "default"}) public class LocalMongoConfig { @Autowired Environment env; @Bean public MongoClient mongoClient() throws UnknownHostException { MongoClientOptions.Builder builder = new MongoClientOptions.Builder(); builder.connectionsPerHost(50); builder.writeConcern(WriteConcern.JOURNALED); builder.readPreference(ReadPreference.secondaryPreferred()); MongoClientOptions options = builder.build(); MongoClient mongoClient = new MongoClient(new ServerAddress(env.getProperty("mongo.server"), Integer.parseInt( env.getProperty("mongo.port"))), options); return mongoClient; } @Bean public MongoDbFactory mongoDbFactory() throws UnknownHostException { MongoDbFactory mongoDbFactory = new SimpleMongoDbFactory(mongoClient(),
env.getProperty("mongo.databaseName"),
new UserCredentials(env.getProperty("mongo.userName"),
env.getProperty("mongo.password")));
return mongoDbFactory; } @Bean public MongoTemplate mongoTemplate() throws UnknownHostException { MongoTemplate mongoTemplate = new MongoTemplate(mongoDbFactory()); return mongoTemplate; } } |
mongo.properties
mongo.server=localhost
mongo.port=27017 mongo.databaseName=Test mongo.userName= mongo.password= |
Note: All other configuration settings
(MongoClientOptions) will be the default ones.
2.1 Create
Indexes
Create indexes on frequently queried
fields to avoid full collection scan and to improve performance.
2.1.1
Single Field Index
Single Field Index can be created by
annotating a field of a domain object with @Indexed
Avoid
using unique index, and ensure uniqueness of the document at the application
level. It will improve performance, since MongoDB can avoid uniqueness
check while inserting documents into the collection.
@Indexed //by default the index direction is
ASCENDING
private Long employeeId
@Indexed(direction = IndexDirection.DESCENDING)
private DateTime enrolledDateTime; |
2.1.2
Compound Index
If your query is based on multiple
keys, then construct a compound key, instead of making multiple single field
indexes.
Compound
index needs to be constructed with fields in the following order:
1.
Fields involved in Equality criteria
2.
Fields involved in Range criteria
Query query = new Query(new Criteria().andOperator(Criteria.where("serialNumber").is(serialNumber), new Criteria()
.orOperator(new Criteria().andOperator(Criteria.where("startDateTime").gte (startDateTime.withTimeAtStartOfDay().toDate()), Criteria.where("startDateTime").lt(endDateTime .toDate())), new Criteria().andOperator(Criteria.where("startDateTime").lt(startDateTime .withTimeAtStartOfDay().toDate()), Criteria.where("endDateTime").gt(startDateTime.toDate()))))); |
For this query to execute without a (full)
collection scan, construct a compound index on the domain object Employee.
Following compound index ensures an index scanning (o/p of explain() gives you
cursorType as “BTreeCursor + “Index name”)
@CompoundIndex(name = "slNo_dt_idx",def = "{'serialNumber'
: 1, 'startDateTime' : 1, 'endDateTime' : 1}" )
public class Employee {
….
}
|
Here serialNumber – involves in
equality criteria; startDateTme, endDateTime – involve in Range criteria
The compound Index created in this
order will give n=nscanned=nscannedObjects
output when you execute explain()- means the best possible index to use.
·
n – number of documents returned
·
nscanned – number of indexes scanned
·
nscannedObjects - Number of documents scanned
2.2 Use Covered
Indexes
Try to use covered indexes if
possible.
Covered indexes means the fields
included in the result set and the fields used in the query are part of a
single index. This will help mongo to return the result from index without
scanning the document.
Query q1 = new Query(Criteria.where("code").in("abc","ijk",
"xyz"));
q1.fields().include("name"); |
@CompoundIndex(name
= "code_name_idx", def = "{'code' : 1, 'name' :
1}" )
2.3 Avoid long
Field Names
Avoid unnecessarily long field names.
Field names are repeated across documents and consume space. Smaller field
names allows for a larger number of documents to fit in RAM.
2.4 Include only
Updated Fields in the Update Query
Use Update object to issue updates to only modify fields that have changed,
instead of retrieving the entire document in your application, updating fields
and then saving the document back to the database.
For example, let’s say the salary of
the employee with serialNumber 1009 has changed in that case instead of doing
following:
Employee employee =
mongoTemplate.findOne(new Query(Criteria.where("serialNumber").is(1009)), Employee.class);
employee.setSalary(<updatedSalary>); mongoTemplate.save(employee, "Employee"); |
Downside
of the approach: This issues – 2 Queries and sends entire
object to just update the salary field.
In such cases we should use:
Update update = new Update()
update.set("salary", <updatedSalary>); mongoTemplate.updateFirst(new Query(Criteria.where("serialNumber").is(1009)), update, "Employee"); |
Advantage: In this
case, it just issues one query and moreover just sends the updated field as
part of the update.
2.5 Use
Projections to Reduce the Amount of Data Returned
Use Projections to avoid unnecessary
data being returned to the application. Only include the necessary fields in
the projection to include in the result set.
This can be achieved by:
Query query = new Query(Criteria.where("name").is("Adapter"));
query.fields().include("code").include("tags")
List<Inventory>
inventoryList = mongoTemplate.find(query,
Inventory.class);
|
In this case, only two fields (code,
tags) in the Inventory class is returned and other fields are ignored/excluded
while returning. This will improve the
query latency.
2.6 Use bulk
insert instead of individual inserts
Its good practice to use bulk inserts using insert(Collection<? extends Object> batchToSave, String collectionName) method instead of using multiple individual inserts using insert(Object objectToSave, String collectionName) method.
It will reduce the number of trips to the database, so as the number of network trips.
2.7 Automatic
Deletion of Documents From Collection
If you have a requirement to remove
documents from a collection that elapsed a certain time, then that can be
achieved by setting TTL (Time To Live) option using an annotation @Indexed(expireAfterSeconds= <no.of
seconds>) on a date field in the document
@Indexed(expireAfterSeconds = 604800)
private DateTime createdDateTime; |
So in this case, the document will be
automatically deleted from the collection by MongoDB after 7 days from the
creation date time.
2.8
Use Capped Collections To Store Logs OR Small Caches
It’s a good practice to create/use
capped collections over normal collections to store documents for performing
high throughput operations. Capped collections are fixed-size collections that
support high-throughput operations that insert and retrieve documents based on
insertion order.
Create the
Capped Collection by:
private
void createCappedCollections(){
CollectionOptions options = new CollectionOptions(100000, 50, true); mongoTemplate.createCollection(AppLogger.class, options); } |
Note: You cannot create Capped
Collections as normal Collections just specifying collection name in the
@Document() annotation. In case of Capped collections, you need to specify the
size, max number of documents etc. (CollectionOptions). And since capped
collection ensures insert order, you don’t need to create indexes to get the
documents in insert order from the collection.
Note: You can check whether the
created collection is Capped collection or not by using
mongoTemplate.getCollection(<CollectionName>).isCapped()
|
3.1 Test every
query in your application with explain()
Spring Data MongoDB doesn’t provide a
utility method for viewing query plan. But you could write a generic custom
method, that uses explain() method from Mongo Java Driver using MongoTemplate
to evaluate the query plan (for example, which index is used or whether full
collection scan occurred etc.)
Include following method in the DAO
class to monitor the query plan
public
void performExplainQuery(Query
query, String collectionName) {
DBCollection dbCollection = mongoTemplate.getCollection(collectionName); DBCursor cursor = dbCollection.find(query.getQueryObject()); System.out.println("Query Plan: "+ cursor.explain()); } |
Invoke this with two parameters - query to evaluate and the collectionName on which the query needs to
be executed from your DAO methods.
3.2 Add Audit
Entries to Model Objects
Define an audit object with fields’
createdOn and updatedOn and version. Optionally can have createdBy and
updatedBy if necessary. This will give information about when a document is
last updated or created on. We found this very helpful while debugging and
tracing the details.
public
class Audit {
@CreatedDate private DateTime createdOn;
@LastModifiedDate
private DateTime updatedOn; @Version private Long version; public DateTime getCreatedOn() { return createdOn; }public void setCreatedOn(DateTime createdOn) { this.createdOn = createdOn; } public DateTime getUpdatedOn() { return updatedOn; } public void setUpdatedOn(DateTime updatedOn) { this.updatedOn = updatedOn; } public Long getVersion() { return version; } public void setVersion(Long version) { this.version = version; } } |
@Document(collection="Employee")
public class Employee extends Audit {
…
}
|
On update of document (for instance:
Employee), automatically updatedOn
field will be updated and on create createdOn
field will be populated. Version field will be used for
Optimistic Locking. It will be automatically incremented on update.
To get this working, you need to add
the @EnableMongoAuditing annotation
in the configuration class.
@Configuration
@PropertySource(value= "classpath:/mongo.properties") @Profile({ "default","local-noproxy" }) @EnableMongoAuditing public class LocalMongoConfig { |
Spring Data MongoDB will get the
current user from session or spring security context based on the application
settings to populate the createdBy and updatedBy fields.
3.3 Enable
Optimistic Locking on Write Operations
Implement Optimistic locking using
@Version annotation. If two threads tried to update the same object at the same
time, then one will be thrown will error saying the version is different than
what the thread has. Because the first thread has incremented while updating
the object. And so the other thread is actually working on stale data. So in
that case, we can implement some kind of retry mechanism to make sure it can
get the latest object and merge the changes before update into Mongo.
@Version
private Long version; |
3.4 Evaluate the
Performance of each DAO Methods using Spring AOP
It’s a good practice to see the
performance of each DAO operation and see whether any special investigation is
required to get the performance better. We can write a Spring Around Advise to
do the same.
/**
* Aspect that implements automatic logging on performance of the data access queries with Spring Data MongoDB. * * @author Felix Jose */ @Aspect @Component("PerformanceProfilerAspect") public class PerformanceProfilerAspect { @Pointcut("execution(* com.felix.dao.*.*(..))") public void clientMethodPointcut() { } /** * Log on the performance of the interactions/queries on MongoDB. * * @param joinPoint the join point * @throws Throwable the throwable */ @Around("clientMethodPointcut()") public Object retryOnConnectionException(ProceedingJoinPoint joinPoint) throws Throwable { Object ret = null; System.out.println("PerformanceProfilerAspect: Advised with logic to calculate the Time Taken for the execution of the method ["+joinPoint.getSignature()+"]"); StopWatch stopWatch = new StopWatch(); stopWatch.start(); String throwableName = null; try { ret = joinPoint.proceed(); } catch (Throwable t) { throwableName = t.getClass().getName(); throw t; } finally { stopWatch.stop(); if (throwableName != null) { System.out.println("Timed ["+joinPoint.getSignature().toString()+"]: " +stopWatch .getTotalTimeMillis()+" milliseconds , with exception ["+throwableName+"]"); } else { System.out.println("Timed ["+joinPoint.getSignature().toString()+"]: " +stopWatch .getTotalTimeMillis()+" milliseconds"); } } return ret; } } |
Note: You can have logger
implementation to log the performance statistics instead of
System.out.println()
4.1 Upsert Without Update Object
When you are not aware whether an
object is already present in its mongo collection, then we should use
upsert/save. But the problem with MongoTemplate.upsert() is that it expects an
Update instance/object which should be populated with all the updated fields.
But this can be tedious when the no. of fields in the domain Object is huge and
the number of fields updated are unknown. In that case we cannot use
MongoTemplate.upsert().
Now the other option is
MongoTemplate.save() method, which accepts the domain object as its argument.
And Mongo java driver will check whether the id field is present in the object
or not. If the id is present then it’s considered for update otherwise the
object is inserted. But the drawback of this approach is first we need to send
a query to fetch the object from Mongo and then update its fields with the
changesand then send it to the save method. So there are two db calls.
The best approach is using mongoTemplate.execute()
method as follows:
public boolean persistEmployee(Employee employee) throws Exception
{
BasicDBObject dbObject = new BasicDBObject();
mongoTemplate.getConverter().write(employee, dbObject); mongoTemplate.execute(Employee.class, new CollectionCallback<Object>() { public Object doInCollection(DBCollection collection) throws MongoException, DataAccessException { collection.update(new Query(Criteria.where("name").is(employee.getName())).getQueryObject(), dbObject, true, // means upsert - true false // multi update – false ); return null; } }); return true; } |
This gives the flexibility to avoid making
separate call to mongo and the updating the object received from mongo with the
actual changes and then send that updated object to update into mongo..
4.2
Don’t Use ID field in Domain Objects
If your domain object has the id
field or any field annotated with @Id, and this object is involved in upsert/
MongoTemplate.execute with collection.update having upsert option true, then
MongoDB java driver will insert if the document is not present or update if the
document is present while the given query is executed.
But the inserted document will have
_id populated as null, if you application didn’t assign any value to it.
@Document(collection = "Employee")
public class Employee { @Id private ObjectId id; @Indexed private String serialNumber; @NotNull
……
}
|
BasicDBObject dbObject = new BasicDBObject();
sdcMongoTemplate.getConverter().write(employee, dbObject); sdcMongoTemplate.execute(Employee.class, new CollectionCallback<Object>() { @Override public Object doInCollection(DBCollection collection) throws MongoException, DataAccessException { collection.update((new Query(Criteria.where("serialNumber").is(serialNumber))).getQueryObject(), dbObject, true, false); return null; } }); |
Creates a document for the first
time with _id = null
{ "_id" : null, "_class" : "com.felix.Employee",
"serialNumber" : "15050803",…..
|
But if you comment out the id field
from the domain class, MongoDB java driver will automatically populate _id
field in the document while inserting.
@Document(collection = "Employee") public class Employee { /*@Id private ObjectId id;*/ @Indexed private String serialNumber; @NotNull
……
}
|
BasicDBObject dbObject = new BasicDBObject();
sdcMongoTemplate.getConverter().write(employee, dbObject); sdcMongoTemplate.execute(Employee.class, new CollectionCallback<Object>() { @Override public Object doInCollection(DBCollection collection) throws MongoException, DataAccessException { collection.update((new Query(Criteria.where("serialNumber").is(serialNumber))).getQueryObject(), dbObject, true, false); return null; } }); |
{ "_id" : ObjectId("562138337ae8629d987f01e6"),
"_class" : "com.felix.Employee", "serialNumber"
: "15050803",…..
|
4.3
Remove Unnecessary _Class Field
Spring MongoDB by default includes _class
field pointing to the entity's fully-qualified class name in the document as some
kind of hint about what type to instantiate actually. Since mongo collection
can contain documents that represent instances of a variety of types.
E.g.: store a hierarchy of
classes (inheritance)
In case you want to avoid writing
the entire Java class name as type information but rather like to use some key
you can use the @TypeAlias annotation at the entity class being persisted.
@Document(collection = "Inventory")
@TypeAlias("Inventory") public class Inventory { |
But when you retrieve documents
using MongoTemplate.find..() by providing/passing the actual entity object type
to which the document to be converted (eg: MongoTemplate.find(new Query(), Inventory.class)), then you may not
need this _class field in the corresponding document at all. So in that case,
we can avoid spring mongo to include _class field in each document by defining:
@Bean
public MongoTemplate mongoTemplate() throws UnknownHostException { MappingMongoConverter mappingMongoConverter = new MappingMongoConverter(new DefaultDbRefResolver (mongoDbFactory()), new MongoMappingContext()); mappingMongoConverter.setTypeMapper(new DefaultMongoTypeMapper(null)); return new MongoTemplate(mongoDbFactory(), mappingMongoConverter ); } |
This will help us to reduce the
document size, since _class will be there in all the documents and consume
space.
It's wonderful article. thanks.
ReplyDelete