iddqd: mongoDB

Showing posts with label mongoDB. Show all posts

Friday, October 23, 2009

simple program to save file from c++ to (mongodb) gridfs

MongoDB C++ Tutorial http://www.mongodb.org/pages/viewpage.action?pageId=133415

The best start is building MongoDB from source (Ubuntu|Debian).

$sudo apt-get install g++ scons libpcre++-dev libmozjs-dev libpcap-dev libboost-dev
$cd /usr/src
$sudo git clone git://github.com/mongodb/mongo.git
$cd mongodb
$scons
$scons --prefix=/opt/mongo install
$cd ~
$gvim test_gridfs.cpp

#include <iostream> 
#include <vector>

#include <boost/algorithm/string.hpp>

#include <mongo/client/dbclient.h>
#include <mongo/client/gridfs.h>

// g++ tutorial.cpp -lmongoclient -lboost_thread -lboost_filesystem -o tutorial

using namespace std;
using namespace mongo;

int main(int argc, const char **argv) { 
const char *fileName = "";
std::vector<std::string> strs;

if (argc != 2) {
cerr << "Usage " << argv[0] << " local_file "  << endl; 
      return -12;
   }

   fileName = argv[1];
   //to generate gridfs file name
   boost::split(strs, fileName, boost::is_any_of("/"));

   DBClientConnection c;
   c.connect("localhost"); 
   cout << "connected ok" <<endl;

   GridFS gfs = GridFS(c, "test", "testcpp");
   gfs.storeFile(fileName, strs[strs.size()-1]);
   cout << "file stored" << endl;

   return 0;
}

g++ -o file_to_gridfs.o -c -I/opt/mongo/include file_to_gridfs.cpp
g++ -o file_to_gridfs file_to_gridfs.o -L/opt/mongo/lib -lmongoclient -lboost_thread -lboost_filesystem

Tuesday, October 20, 2009

mongoDB gridfs and sharding

If you want use gridfs and sharding chunks. In example is the mistake. Sharding by "_id" dosen't work.

Working example.

First ....
http://www.mongodb.org/display/DOCS/A+Sample+Configuration+Session

and then ...
$mongo
> use admin
switched to db admin
> db.runCommand( { shardcollection : "test.dexters.chunks", key : { n : 1 } } )
{"collectionsharded" : "test.dexters.chunks" , "ok" : 1}

$vim test_load.py
====================================================
#!/usr/bin/env python
import sys
import os.path
from pymongo.connection import Connection
from gridfs import GridFS

connection = Connection("localhost", 27017)
db = connection["test"]

name = os.path.basename(sys.argv[1])

fs = GridFS(db)
fp = fs.open(name, 'w', 'dexters')

for l in open(sys.argv[1]):
    fp.write(l)
fp.close()
====================================================

There is at leas two sharding strategy, by file_id or by (n, id).
Sharding by file_id is no RAID or RAID-1 (on file level) sharding by n when you use many servers can be like (RAID-0, RAID-10). On collection level performance is always RAID-0, RAID-10 ;-)

Friday, October 16, 2009

mongoDB rocks

I have found this project very useful for big sites and heavy load.

mongodb, comparision.

vim ./test_load.py
======================================
#!/usr/bin/env python
from pymongo.connection import Connection

connection = Connection("localhost", 27017)
db = connection["test"]
for collection in db.collection_names():
   print collection

collection = db["testCollection"]

for i in range(10000):
   collection.insert({"i":i, "body": 'a - %s'%(i,)})
======================================
$time ./test_load.py
real 0m5.301s
user 0m4.160s
sys 0m0.230s
$du -hs /data/db/
81M /data/db/

iddqd