Clarence's Wicked Mind: December 2015

Wednesday, December 16, 2015

Apache Spark with MongoDB (using pymongo-spark)

Here is a follow up on previous post about using Apache Spark to work on MongoDB data. Please refer to the old post for details on the setup.

This post is about using the "unstable" pymongo-spark library to create MongoDB backed RDD.

First, get the mongo-hadoop source tree from github:

git clone https://github.com/mongodb/mongo-hadoop.git

We need to build a jar and copy a Python script from the source.

For building the jar:

./gradlew jar

When done, the only jar we need is (version number depends on the source you checkout)

spark/build/libs/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar

Copy it to the lib directory under Spark. Also need to add it to the classpath when using pySpark or spark-submit.

And the Python file we need from the source tree is located at:

mongo-hadoop/spark/src/main/python/pymongo_spark.py

Save it under the same folder as the application script and remember to use "--py-files" parameter to deploy it.

Also make sure you have pymongo installed. e.g. for Debian machines

sudo apt-get install python-pymongo

Finally, the application script. The pymongo-spark needs to be activated before loading / writing RDDs. Also, the structure of the loaded RDD is slightly different when compared with RDD using mongo-hadoop.

from operator import add
from pyspark import SparkContext, SparkConf
import pymongo_spark

if __name__ == "__main__":

# initiate pymongo-spark
pymongo_spark.activate()

conf = SparkConf() \
.setAppName("SparkMongoDB2") \
.set('spark.executor.extraClassPath', '/home/spark/spark-1.4.1-bin-hadoop2.6/lib/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar')
sc = SparkContext(conf=conf)

# we know the number of business is much less than reviews
# create a dictionary of stars given to each business
businessStar = sc.mongoRDD('mongodb://192.168.1.69:27017/test.yelp_business') \
.map(lambda b: (b['business_id'], b['stars'])).collectAsMap()

# create and process review RDD
poorReview = sc.mongoRDD('mongodb://192.168.1.69:27017/test.yelp_review') \
.filter(lambda r: r['stars'] < businessStar[r['business_id']]) \
.map(lambda r: (r['business_id'], 1)) \
.reduceByKey(add)

# save output back to MongoDB
poorReview.saveToMongoDB('mongodb://192.168.1.69:27017/test.poor_count')

sc.stop()

Here is an example of submitting the application

bin/spark-submit --master spark://192.168.1.10:7077 --conf "spark.eventLog.enabled=true" --py-files "/home/spark/test/spark_mongodb/pymongo_spark.py" --driver-class-path="/home/spark/spark-1.4.1-bin-hadoop2.6/lib/mongo-hadoop-spark-1.5.0-SNAPSHOT.jar" --executor-memory=256m ~/test/spark_mongodb/yelp_poor2.py

Tuesday, December 15, 2015

Apache Spark with MongoDB

Updates (2015-12-17): There are two ways for Apache Spark to access MongoDB data: mongo-hadoop or pymongo-spark. This post is about using mongo-hadoop. There is another post on using pymongo-spark.

Here is an example of running analytic tasks on Apache Spark using data from MongoDB. Note that this is a trivial problem and it is much more efficient to solve it within MongoDB (or using PostgreSQL FDW). But it is a good exercise to try out the mongo-hadoop solution.

My Apache Spark cluster consists of one UDOO and one Raspberry Pi with UDOO as both master and slave. The CPU and memory resources on these devices are limited.

UDOO: 192.168.1.10. 1GB memory and 4x 1GHz cores
RPi: 192.168.1.7. 512MB memory and 1x 700MHz core

MongoDB is installed on a x86 machine (192.168.1.69). Some Yelp sample data is loaded in the test database. In particular, we are interested in the "business" and "review" collections:

> db.yelp_business.findOne()
{
"_id" : ObjectId("566bb192563714b25d604a94"),
"business_id" : "UsFtqoBl7naz8AVUBZMjQQ",
"full_address" : "202 McClure St\nDravosburg, PA 15034",
"hours" : {

},
"open" : true,
"categories" : [
"Nightlife"
],
"city" : "Dravosburg",
"review_count" : 4,
"name" : "Clancy's Pub",
"neighborhoods" : [ ],
"longitude" : -79.88693,
"state" : "PA",
"stars" : 3.5,
"latitude" : 40.350519,
"attributes" : {
"Happy Hour" : true,
"Accepts Credit Cards" : true,
"Good For Groups" : true,
"Outdoor Seating" : false,
"Price Range" : 1
},
"type" : "business"
}
> db.yelp_business.count()
61184
> db.yelp_review.findOne()
{
"_id" : ObjectId("566bb1eb563714b25d61ea02"),
"votes" : {
"funny" : 0,
"useful" : 2,
"cool" : 0
},
"user_id" : "H1kH6QZV7Le4zqTRNxoZow",
"review_id" : "RF6UnRTtG7tWMcrO2GEoAg",
"stars" : 2,
"date" : "2010-03-22",
"text" : "Unfortunately, the frustration of being Dr. Goldberg's patient is a repeat of the experience I've had with so many other doctors in NYC -- good doctor, terrible staff. It seems that his staff simply never answers the phone. It usually takes 2 hours of repeated calling to get an answer. Who has time for that or wants to deal with it? I have run into this problem with many other doctors and I just don't get it. You have office workers, you have patients with medical needs, why isn't anyone answering the phone? It's incomprehensible and not work the aggravation. It's with regret that I feel that I have to give Dr. Goldberg 2 stars.",
"type" : "review",
"business_id" : "vcNAWiLM4dR7D2nwwJ7nCA"
}
> db.yelp_review.count()

1569264

We are going to find, for each business, the number of reviews with rating lower than its rating.

First, on the Spark machines, we need to download the MongoDB Java driver and the mongo-hadoop software:

wget http://central.maven.org/maven2/org/mongodb/mongo-java-driver/3.0.4/mongo-java-driver-3.0.4.jar
wget https://github.com/mongodb/mongo-hadoop/releases/download/r1.4.0/mongo-hadoop-core-1.4.0.jar

I saved them under the lib directory of Spark.

The logic is written in Python:

from operator import add
from pyspark import SparkContext, SparkConf

if __name__ == "__main__":

conf = SparkConf() \
.setAppName("SparkMongoDB") \
.set('spark.executor.extraClassPath', '/home/spark/spark-1.4.1-bin-hadoop2.6/lib/mongo-hadoop-core-1.4.0.jar:/home/spark/spark-1.4.1-bin-hadoop2.6/lib/mongo-java-driver-3.0.4.jar')
sc = SparkContext(conf=conf)

keyClassName = 'org.apache.hadoop.io.Text'
valueClassName = 'org.apache.hadoop.io.MapWritable'

# two collections from MongoDB
configReview = {'mongo.input.uri': 'mongodb://192.168.1.69:27017/test.yelp_review'}
configBusiness = {'mongo.input.uri': 'mongodb://192.168.1.69:27017/test.yelp_business'}

# we know the number of business is much less than reviews
# create a dictionary of stars given to each business
businessStar = sc.newAPIHadoopRDD(
inputFormatClass='com.mongodb.hadoop.MongoInputFormat',
keyClass=keyClassName,
valueClass=valueClassName,
conf=configBusiness) \
.map(lambda b: (b[1]['business_id'], b[1]['stars'])).collectAsMap()

# create and process review RDD
poorReview = sc.newAPIHadoopRDD(
inputFormatClass='com.mongodb.hadoop.MongoInputFormat',
keyClass=keyClassName,
valueClass=valueClassName,
conf=configReview) \
.filter(lambda r: r[1]['stars'] < businessStar[r[1]['business_id']]) \
.map(lambda r: (r[1]['business_id'], 1)) \
.reduceByKey(add)

''' This alternative is more elegant but require more memory
businessStar = sc.newAPIHadoopRDD(
inputFormatClass='com.mongodb.hadoop.MongoInputFormat',
keyClass=keyClassName,
valueClass=valueClassName,
conf=configBusiness) \
.map(lambda b: (b[1]['business_id'], b[1]['stars']))

poorReview = sc.newAPIHadoopRDD(
inputFormatClass='com.mongodb.hadoop.MongoInputFormat',
keyClass=keyClassName,
valueClass=valueClassName,
conf=configReview) \
.map(lambda r: (r[1]['business_id'], r[1]['stars'])) \
.join(businessStar) \
.filter(lambda (business_id, (rStar, bStar)): bStar > rStar) \
.map(lambda (business_id, (rStar, bStar)): (business_id, 1)) \
.reduceByKey(add)
'''

# save output back to MongoDB
outConfig = {'mongo.output.uri': 'mongodb://192.168.1.69:27017/test.poor_count'}
poorReview.saveAsNewAPIHadoopFile(
path='file:///not-used',
outputFormatClass='com.mongodb.hadoop.MongoOutputFormat',
keyClass=keyClassName,
valueClass=valueClassName,
conf=outConfig)

sc.stop()

Start the cluster ("sbin/start-all.sh") and submit the job, e.g.:

bin/spark-submit --master spark://192.168.1.10:7077 --conf "spark.eventLog.enabled=true" --driver-class-path="/home/spark/spark-1.4.1-bin-hadoop2.6/lib/mongo-hadoop-core-1.4.0.jar:/home/spark/spark-1.4.1-bin-hadoop2.6/lib/mongo-java-driver-3.0.4.jar" --executor-memory=256m ~/test/spark_mongodb/yelp_poor.py

It took 20 minutes to complete. Swapping was a big issue due to lack of memory on the ARM devices.

Next, will try pymongo-spark to see if skipping the Hadoop layer will improve the performance (however, as of version 0.1, pymongo-spark used mongo-hadoop as the underlying logic).

Saturday, December 12, 2015

Playing with PostgreSQL FDW for MongoDB

While it was interesting to read articles on people bashing MongoDB's latest BI connector is based on PostgreSQL, I wonder how the (alleged) Python implemented Foreign Data Wrapper (FDW) performs.

First I got PostgreSQL 9.4 and Python dev installed on my openSUSE Tumbleweed machine:

postgresql94 postgresql94-server postgresql94-devel postgresql94-contrib python-devel python-setuptools

Then build and install Multicorn, which is a Postgresql extension that the (alleged) MongoDB FDW is based on:

git clone git://github.com/Kozea/Multicorn.git
cd Multicorn
sudo make
sudo make install

Also install the FDW:

git clone https://github.com/asya999/yam_fdw.git
cd yam_fdw
sudo python setup.py install

Start the PostgreSQL server if it is not running:

sudo systemctl start postgresql.service

To test the performance, I got some sample Yelp data loaded into MongoDB 3.0.7. Here is what the format of "business" data:

> db.yelp_business.findOne()
{
"_id" : ObjectId("566bb192563714b25d604a94"),
"business_id" : "UsFtqoBl7naz8AVUBZMjQQ",
"full_address" : "202 McClure St\nDravosburg, PA 15034",
"hours" : {

},
"open" : true,
"categories" : [
"Nightlife"
],
"city" : "Dravosburg",
"review_count" : 4,
"name" : "Clancy's Pub",
"neighborhoods" : [ ],
"longitude" : -79.88693,
"state" : "PA",
"stars" : 3.5,
"latitude" : 40.350519,
"attributes" : {
"Happy Hour" : true,
"Accepts Credit Cards" : true,
"Good For Groups" : true,
"Outdoor Seating" : false,
"Price Range" : 1
},
"type" : "business"
}
> db.yelp_business.count()
61184

And here is sample of the review data:

> db.yelp_review.findOne()
{
"_id" : ObjectId("566bb1eb563714b25d61ea02"),
"votes" : {
"funny" : 0,
"useful" : 2,
"cool" : 0
},
"user_id" : "H1kH6QZV7Le4zqTRNxoZow",
"review_id" : "RF6UnRTtG7tWMcrO2GEoAg",
"stars" : 2,
"date" : "2010-03-22",
"text" : "Unfortunately, the frustration of being Dr. Goldberg's patient is a repeat of the experience I've had with so many other doctors in NYC -- good doctor, terrible staff. It seems that his staff simply never answers the phone. It usually takes 2 hours of repeated calling to get an answer. Who has time for that or wants to deal with it? I have run into this problem with many other doctors and I just don't get it. You have office workers, you have patients with medical needs, why isn't anyone answering the phone? It's incomprehensible and not work the aggravation. It's with regret that I feel that I have to give Dr. Goldberg 2 stars.",
"type" : "review",
"business_id" : "vcNAWiLM4dR7D2nwwJ7nCA"
}
> db.yelp_review.count()
1569264

So, say I want to do a join of the two collections in PostgreSQL to find out, for each business, how many reviews give lower rating than its existing rating. Here are the sample SQL scripts execute in psql:

-- create the Multicorn extension
CREATE EXTENSION multicorn;

-- define the mongodb FDW
create server mongodb_yelp foreign data wrapper multicorn options (wrapper 'yam_fdw.Yamfdw');
-- turn on debug if necessary
--create server mongodb_yelp foreign data wrapper multicorn options (wrapper 'yam_fdw.Yamfdw', debug 'True');

-- create the foreign tables
create foreign table yelp_review ("_id" varchar, "review_id" varchar, "user_id" varchar, "business_id" varchar, "stars" numeric, "date" date, "text" varchar) server mongodb_yelp options (db 'test', collection 'yelp_review');
create foreign table yelp_business ("_id" varchar, "business_id" varchar, "name" varchar, "stars" numeric, "longitude" float8, "latitude" float8, "full_address" varchar, "type" varchar) server mongodb_yelp options (db 'test', collection 'yelp_business');

-- use SQL to join the collections
select b.name, count(1)
from yelp_business b
left outer join yelp_review r
on r.business_id = b.business_id
and r.stars < b.stars
group by b.name
fetch first 100 rows only;

The performance is nothing to write home about, but is better than I expected. It is hard to do push-down to foreign data source.

Anyway, I did fork the FDW and made some minor changes.

PS. Not sure if the bashing (here, here) of MongoDB's BI connector was caused by conflict of interest (after all, Slam Data is an analytic tool for NoSQL). But I am no fanboy of any particular software.

I just use whatever software that suits my needs. The more choices I have, the better off I will be.

Saturday, December 5, 2015

Building MongoDB 3.0.x for ARM (armv7l) platforms

MongoDB does not officially support the 32-bit ARM platform. See the MongoDB Jira discussions for details.

Also note that for some Linux distributions of ARM, you can install MongoDB directly with their package managers, e.g. on Arch Linux: "pacman -Sy mongodb". (But the latest build on Arch Linux ARM seems to be having issues.)

Updates (2016-01-18): Please note that the latest Arch Linux ARM package of MongoDB 3.2 seems to be working fine again.

Here are steps to compile MongoDB on 32-bit ARM. I did it on my UDOO board with Arch Linux ARM. But the procedure should be similar for other armv7l boards (e.g. Beaglebone Black).

While there are tutorials on how to compile MongoDB on ARM machines, most of them are based on fork such as this or this. But these forks are pretty old (version 1.x or 2.x only). Since I wanted to setup an ARM replica of my x86 primary running MongoDB 3.0.7, I decided to try to compile directly from the MongoDB git source.

Preparation

I will be compiling the source directly on the ARM board instead of cross-compiling.

Depending on your hardware and available RAM, you may need to add a swap file in order to build MongoDB successfully. For reference, with the UDOO setup of 1GB of RAM, 1GB of swap space and swappiness equals to 1, the compilation process could hit 500MB of swap usage.

Make sure we have the latest software and compilers etc. On the Arch Linux ARM, that means running:

sudo pacman -Syu

sudo pacman -Sy base-devel scons git-core openssl boost boost-libs

If you are using other Linux distributions, you may need other packages instead. e.g. for Debian / Ubuntu with apt-get, you probably need "build-essential" instead of "base-devel":

sudo apt-get update

sudo apt-get upgrade

sudo apt-get install build-essential libssl-dev git scons libboost-filesystem-dev libboost-program-options-dev libboost-system-dev libboost-thread-dev

Get the MongoDB source

Get the MongoDB source from github and checkout the appropriate version. I will be compiling version 3.0.7:

mkdir ~/test

cd ~/test

git clone git://github.com/mongodb/mongo.git

cd mongo

git checkout r3.0.7

Configure the V8 build for ARM

MongoDB depends on the V8 Javascript engine. There are two versions of V8 in the source tree: "mongo/src/third_party/v8" and "mongo/src/third_party/v8-3.25". While the 3.25 version of V8 supports both ARM and ARM64, the SConscript is missing definitions for the 32-bit ARM.

(Note that 3.25 is not the default version. There are problems with some of the memory tests. And it seems that MongoDB will move to use SpiderMonkey soon)

Edit the file mongo/src/third_party/v8-3.25/SConscript to add definitions for "armv7l" accordingly (or you may get the patched file here):

@@ -77,6 +77,9 @@ LIBRARY_FLAGS = { 
    'arch:arm64': { 
      'CPPDEFINES':   ['V8_TARGET_ARCH_ARM64'], 
    }, 
+    'arch:arm': { 
+      'CPPDEFINES':   ['V8_TARGET_ARCH_ARM'], 
+    }, 
  }, 
  'msvc': { 
    'all': { 
@@ -308,6 +311,27 @@ SOURCES = { 
    arm64/stub-cache-arm64.cc 
    arm64/utils-arm64.cc 
    """), 
+  'arch:arm': Split(""" 
+    arm/assembler-arm.cc 
+    arm/builtins-arm.cc 
+    arm/code-stubs-arm.cc 
+    arm/codegen-arm.cc
+    arm/constants-arm.cc 
+    arm/cpu-arm.cc 
+    arm/debug-arm.cc 
+    arm/deoptimizer-arm.cc 
+    arm/disasm-arm.cc 
+    arm/frames-arm.cc 
+    arm/full-codegen-arm.cc 
+    arm/ic-arm.cc 
+    arm/lithium-arm.cc 
+    arm/lithium-codegen-arm.cc 
+    arm/lithium-gap-resolver-arm.cc 
+    arm/macro-assembler-arm.cc 
+    arm/regexp-macro-assembler-arm.cc 
+    arm/simulator-arm.cc 
+    arm/stub-cache-arm.cc 
+    """), 
  'os:freebsd': ['platform-freebsd.cc', 'platform-posix.cc'], 
  'os:openbsd': ['platform-openbsd.cc', 'platform-posix.cc'],
   'os:linux':   ['platform-linux.cc', 'platform-posix.cc'], 
@@ -362,6 +386,8 @@ def get_options(): 
        arch_string = 'arch:x64' 
    elif processor == 'aarch64': 
        arch_string = 'arch:arm64' 
+    elif processor == 'armv7l': 
+        arch_string = 'arch:arm' 
    else: 
        assert False, "Unsupported architecture: " + processor

Also, depending on your environment, you may want to patch the code otherwise the compiled mongod and mongo will cause segmentation fault.

Compile and install

Next we can compile and install MongoDB:

cd ~/test/mongo/

scons -j 2 --ssl --wiredtiger=off --js-engine=v8-3.25 --c++11=off --disable-warnings-as-errors CXXFLAGS="-std=gnu++11" all

scons --ssl --wiredtiger=off --js-engine=v8-3.25 --c++11=off --disable-warnings-as-errors CXXFLAGS="-std=gnu++11" --prefix=/home/cho/apps/mongo install

Although the i.MX6Q CPU on my UDOO has 4 cores, I used "-j 2" instead of "-j 4" due to stability reason. Adjust according to your CPU power / memory usage.

The WiredTiger storage engine does not support 32-bit platforms. So we are disabling it here with "--wiredtiger=off".

The "--js-engine" parameter is to select the non-default V8 engine to use.

Standard c++11 is disabled with "--c++11=off" and the "-std=gnu++11" flag is used instead. The Google Performance Tools (gperftools) has problem with the "-std=c++11" flag when compiled for ARM. This could be a problem if you tried to compile latest MongoDB source as "--c++11=on" has been made as mandatory. You probably still can disable it by editing SConstruct if necessary. Or use the system allocator ("--allocator=system") instead of tcmalloc.

The "--prefix" parameter specifies where to install MongoDB. You probably want to specify another location.

If you only want to build the database, no need to specify "all". Or specify "core" to build mongo, mongod, and mongos.

Depends on the toolchain you are using, during compilation, the assembler will give warnings about SWP/SWPB instructions being deprecated:

Warning: swp{b} use is deprecated for ARMv6 and ARMv7

The generated code should be fine as Linux kernel can emulate these instructions (you may run "dmesg | grep -i swp" to check). ~~Not sure how to avoid it (tried specifying various gcc parameters such as -mcpu and -march but seems not effective) and whether it will affect the performance or not. Anyone?~~ (See updates below)

And you can run "cat /proc/cpu/swp_emulation" and see the emulated count.

Build the tools (optional)

You may also want to build the tools. Install Go first:

sudo pacman -Sy go

Then get the source and build:

cd ~/test

git clone https://github.com/mongodb/mongo-tools
cd mongo-tools
git checkout r3.0.7

./build.sh

(Update 2015-12-08): It turns out that the boost library included in MongoDB is pretty old and it contains inline assembly with SWP instruction. You can apply the patch to the file src/third_party/boost/boost/smart_ptr/detail/spinlock_gcc_arm.hpp

Or, use the system boost library by specifying the "--user-system-boost" parameter when building.

References

http://gravitronic.com/compiling-mongodb-with-ssl-support-on-ubuntu-12-04-lts/

https://github.com/linux-on-ibm-z/docs/wiki/Building-MongoDB-3.0-on-RHEL-7-and-SLES-12

Friday, December 4, 2015

MongoDB 3.0.7 segmentation fault on ARM

Updates (2015-12-28): mongodb-3.2.0-3-armv7h of Arch Linux ARM ssems to be working fine.

The latest stable version (3.0.7) of MongoDB is having problem on Arch Linux ARM platforms.

Out of curiosity, I checked out the code from github and tried to compile and reproduce it. The segmentation fault seems to be caused by std::map. Here is the gdb backtrace:

#0  std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0xad3e28, 
    __str=<error reading variable: Cannot access memory at address 0x0>)
    at /build/gcc/src/gcc-build/armv7l-unknown-linux-gnueabihf/libstdc++-v3/include/bits/basic_string.tcc:617
#1  0x00459b08 in std::pair<std::string const, mongo::optionenvironment::Value>::pair<std::string const&, 0u>(std::tuple<std::string const&>&, std::tuple<>&, std::_Index_tuple<0u>, std::_Index_tuple<>) (__tuple2=<synthetic pointer>, 
    __tuple1=..., this=0x0) at /usr/include/c++/5.2.0/tuple:1172
#2  std::pair<std::string const, mongo::optionenvironment::Value>::pair<std::string const&>(std::piecewise_construct_t, std::tuple<std::string const&>, std::tuple<>) (__second=..., __first=..., this=0x0)
    at /usr/include/c++/5.2.0/tuple:1161
#3  __gnu_cxx::new_allocator<std::_Rb_tree_node<std::pair<std::string const, mongo::optionenvironment::Value> > >::construct<std::pair<std::string const, mongo::optionenvironment::Value>, std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<> >(std::pair<std::string const, mongo::optionenvironment::Value>*, std::piecewise_construct_t const&, std::tuple<std::string const&>&&, std::tuple<>&&) (__p=0x0, this=0xbefff558)
    at /usr/include/c++/5.2.0/ext/new_allocator.h:120
#4  std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<std::string const, mongo::optionenvironment::Value> > > >::_S_construct<std::pair<std::string const, mongo::optionenvironment::Value>, std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<> >(std::allocator<std::_Rb_tree_node<std::pair<std::string const, mongo::optionenvironment::Value> > >&, std::pair<std::string const, mongo::optionenvironment::Value>*, std::piecewise_construct_t const&, std::tuple<std::string const&>&&, std::tuple<>&&) (
    __p=<optimized out>, __a=...)
    at /usr/include/c++/5.2.0/bits/alloc_traits.h:256
#5  std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<std::string const, mongo::optionenvironment::Value> > > >::construct<std::pair<std::string const, mongo::optionenvironment::Value>, std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<> >(std::allocator<std::_Rb_tree_node<std::pair<std::string const, mongo::optionenvironment::Value> > >&, std::pair<std::string const, mongo::optionenvironment::Value>*, std::piecewise_construct_t const&, std::tuple<std::string const&>&&, std::tuple<>&&) (__p=<optimized out>, 
    __a=...) at /usr/include/c++/5.2.0/bits/alloc_traits.h:402
#6  std::_Rb_tree<std::string, std::pair<std::string const, mongo::optionenvironment::Value>, std::_Select1st<std::pair<std::string const, mongo::optionenvironment::Value> >, std::less<std::string>, std::allocator<std::pair<std::string const, mongo::optionenvironment::Value> > >::_M_construct_node<std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<> >(std::_Rb_tree_node<std::pair<std::string const, mongo::optionenvironment::Value> >*, std::piecewise_construct_t const&, std::tuple<std::string const&>&&, std::tuple<>&&) (
    __node=0xad3e18, this=0xbefff558)
    at /usr/include/c++/5.2.0/bits/stl_tree.h:529
#7  std::_Rb_tree<std::string, std::pair<std::string const, mongo::optionenvironment::Value>, std::_Select1st<std::pair<std::string const, mongo::optionenvironment::Value> >, std::less<std::string>, std::allocator<std::pair<std::string const, mongo::optionenvironment::Value> > >::_M_create_node<std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<> >(std::piecewise_construct_t const&, std::tuple<std::string const&>&&, std::tuple<>&&) (
    this=0xbefff558) at /usr/include/c++/5.2.0/bits/stl_tree.h:546
#8  std::_Rb_tree<std::string, std::pair<std::string const, mongo::optionenvironment::Value>, std::_Select1st<std::pair<std::string const, mongo::optionenvironment::Value> >, std::less<std::string>, std::allocator<std::pair<std::string const, mongo::optionenvironment::Value> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<std::string const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<std::string const, mongo::optionenvironment::Value> >, std::piecewise_construct_t const&, std::tuple<std::string const&>&&, std::tuple<>&&) (this=0xbefff558, __pos=...)
    at /usr/include/c++/5.2.0/bits/stl_tree.h:2170
#9  0x0046225c in std::map<std::string, mongo::optionenvironment::Value, std::less<std::string>, std::allocator<std::pair<std::string const, mongo::optionenvironment::Value> > >::operator[] (__k="authenticationDatabase", this=0xbefff4c4)
    at /usr/include/c++/5.2.0/bits/stl_map.h:483
#10 mongo::optionenvironment::OptionSection::getDefaults (this=0xbefff51c, 
    this@entry=0xad4038, values=0xbefff4c4, values@entry=0xbefff558)
    at src/mongo/util/options_parser/option_section.cpp:508
#11 0x00462134 in mongo::optionenvironment::OptionSection::getDefaults (
    this=0xbefff550, this@entry=0xbefff68c, values=0x4, 
    values@entry=0xbefff558)
    at src/mongo/util/options_parser/option_section.cpp:514
#12 0x0046984c in mongo::optionenvironment::OptionsParser::addDefaultValues (
    this=this@entry=0xab5750 <mongo::optionenvironment::startupOptions>, 
    options=..., 
    environment=0xab5704 <mongo::optionenvironment::startupOptionsParsed>, 
    environment@entry=0x474934 <mongo::optionenvironment::_mongoInitializerFunction_StartupOptions_Parse(mongo::InitializerContext*)+84>)
    at src/mongo/util/options_parser/options_parser.cpp:845
#13 0x0046edf8 in mongo::optionenvironment::OptionsParser::run (
    this=0xab5750 <mongo::optionenvironment::startupOptions>, 
    this@entry=0xbefff79c, options=..., 
Python Exception <class 'ValueError'> Cannot find type const std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::_Rep_type: 
    argv=std::vector of length 2, capacity 2 = {...}, 
    env=std::map with 21 elements, 
    environment=0xab5704 <mongo::optionenvironment::startupOptionsParsed>)
    at src/mongo/util/options_parser/options_parser.cpp:1012
#14 0x00474934 in mongo::optionenvironment::_mongoInitializerFunction_StartupOptions_Parse (context=0xbefff81c)
    at src/mongo/util/options_parser/options_parser_init.cpp:46
#15 0x0025eed4 in std::_Function_handler<mongo::Status (mongo::InitializerContext*), mongo::Status (*)(mongo::InitializerContext*)>::_M_invoke(std::_Any_data const&, mongo::InitializerContext*&&) (__functor=..., __args#0=<optimized out>)
    at /usr/include/c++/5.2.0/functional:1857
#16 0x0025f330 in std::function<mongo::Status (mongo::InitializerContext*)>::operator()(mongo::InitializerContext*) const (
    __args#0=<error reading variable: Cannot access memory at address 0x0>, 
    this=0xbefff80c) at /usr/include/c++/5.2.0/functional:2271
#17 mongo::Initializer::execute (this=0xab3dc0 <mongo::getGlobalInitializer()::theGlobalInitializer>, 
    Python Exception <class 'ValueError'> Cannot find type const mongo::InitializerContext::EnvironmentMap::_Rep_type: args=std::vector of length 2, capacity 2 = {...}, env=std::map with 21 elements) at src/mongo/base/initializer.cpp:58
#18 0x0025f8cc in mongo::runGlobalInitializers (
    Python Exception <class 'ValueError'> Cannot find type const mongo::InitializerContext::EnvironmentMap::_Rep_type: 
    args=std::vector of length 2, capacity 2 = {...}, 
    env=std::map with 21 elements) at src/mongo/base/initializer.cpp:71
#19 0x0025fb30 in mongo::runGlobalInitializers (argc=argc@entry=2, 
    argv=argv@entry=0xbefffc64, envp=<optimized out>, envp@entry=0xbefffc70)
    at src/mongo/base/initializer.cpp:90
#20 0x002600a0 in mongo::runGlobalInitializersOrDie (argc=argc@entry=2, 
    argv=argv@entry=0xbefffc64, envp=envp@entry=0xbefffc70)
    at src/mongo/base/initializer.cpp:94
#21 0x0025332c in _main (argc=argc@entry=2, argv=argv@entry=0xbefffc64, 
    envp=envp@entry=0xbefffc70) at src/mongo/shell/dbshell.cpp:601
#22 0x0023aeac in main (argc=2, argv=0xbefffc64, envp=0xbefffc70)
    at src/mongo/shell/dbshell.cpp:924

From the getDefaults function in src/mongo/util/options_parser/option_section.cpp, the operator[] of std::map seems caused the failure when allocating new node:

Status OptionSection::getDefaults(std::map<Key, Value>* values) const {

std::list<OptionDescription>::const_iterator oditerator;
for (oditerator = _options.begin(); oditerator != _options.end(); oditerator++) {
if (!oditerator->_default.isEmpty()) {
(*values)[oditerator->_dottedName] = oditerator->_default;
}
}

std::list<OptionSection>::const_iterator ositerator;
for (ositerator = _subSections.begin(); ositerator != _subSections.end(); ositerator++) {
ositerator->getDefaults(values);
}

return Status::OK();
}

Haven't figured out why. But using the equivalent insert seems to be able to get around the issue.

- (*values)[oditerator->_dottedName] = oditerator->_default;
+ //(*values)[oditerator->_dottedName] = oditerator->_default;
+ (*((values->insert(make_pair(oditerator->_dottedName, std::map<Key, Value>::mapped_type()))).first)).second = oditerator->_default;

Updates (2015-12-06): Building MongoDb with "--opt=off" (i.e. using -O0 instead of -O3") also works. Using g++ 5.2.0