Category cloud

Launching Summly on MongoDB

Summly is an innovative iOS application providing meaningful summaries of news articles. Immediately prior to its UK launch in October, I was enlisted to help configure MongoDB for its production environment. I only had a few days to setup the environment and help the dev team work through any issues encountered in a replicated environment. This is how I did it and what I learned along the way.

The Environment

The Summly team didn’t know what kind of load their launch would generate on the data store, so we rolled out a 5-node replica set on m1.large instances, each with a 1TB EBS volume.

In retrospect this was excessive but we wanted to err on the side of having too much capacity instead of too little. Once the environment was configured and operating normally, testing with the existing application code began.

Integration with the Replica Set

The application was configured to prefer reading from secondaries and that’s where the application team encountered its first problem. The application’s pattern of an immediate write (to primary) and a read (from secondary), the replication lag meant the written data likely wasn’t available on the secondary, resulting in errors. There was only one access pattern like this in the code and was quickly resolved by using the consistent request available in the MongoDB Java driver.

The consistent request pattern ensures your reads and writes occur using the same socket, avoiding the asynchronous replication problem inherent is a replica set. It’s usage is straightforward:

Neither of the popular MongoDB Java abstractions (Spring Data-MongoDB or Morphia) provide direct access to the consistent request pattern. This isn’t a problem since both make the underlying MongoDB driver objects accessible, but you lose most of the convenience these frameworks provide.

Deployment and Monitoring

Once the Summly application was released to Apple’s UK app store, I monitored MongoDB’s logs to make sure everything was operating normally. After creating a few missing indexes and optimizing another, the iOS application became significantly more responsive and load decreased considerably.

Lessons Learned

  • As always, create indexes for your most commonly used queries. Make sure you understand the indexing trade offs.
  • Use some form of non-datastore cache to improve application performance.
  • Don’t launch without some way of monitoring the health of every component in your infrastructure. MonogoDB Monitoring Service is an excellent option.
  • Build your application with replica sets in mind. The asynchronous nature of the replication may impact your data access patterns.
  • Hire a professional, like me. 🙂


PaaS Job Growth

In February, Gartner released a strategic look at PaaS adoption in today’s enterprise. The report all but declared an impending strategic catastrophe for enterprises failing to adopt a PaaS solution. Despite these warnings, it’s hard to blame CIOs and CTOs for moving slowly. Decision-making is complicated by a constantly changing provider landscape and pricing structure. Paired with a number of high-profile outages of legacy IaaS/PaaS, it’s understandable if executives are hesitant to fully commit to a PaaS solution.

To get a better idea of the kind of PaaS adoption enterprises may be making, I explored some job trends at Specifically, I was interested in overall job growth in the PaaS space as well as which technologies might be driving adoption. First up, the overall PaaS job picture:

As a percentage of overall job posting, PaaS isn’t exactly breaking any records. However, the raw number of PaaS jobs increased significantly from 2010 to the start of 2012. And this amount of growth can’t be driven solely by service providers. Enterprises are beginning to ramp up on talent in core areas. There’s little surprise what’s driving this growth: the big data bubble. Specifically, Java:

The growth of Java jobs in the PaaS space is understandable since most enterprises seem to be using Java. Hadoop, and to a lesser extent, Python, are also trending up:

Other technologies, such as NoSQL (most prominently MongoDB) and Node.js, have recently shown up on PaaS job openings, but are insignificant at current levels.

The job growth we’re seeing indicates two things. First, enterprises are clearly moving into PaaS, albeit tentatively. But they’re not likely to adopt any new technologies, such as NoSQL. Instead, these graphs indicate enterprises will seek to migrate existing applications and infrastructure out of current data centers.

Copyright © Nick Heudecker

Built on Notes Blog Core
Powered by WordPress