Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Tuesday, October 20, 2009

mongoDB gridfs and sharding

If you want use gridfs and sharding chunks. In example is the mistake. Sharding by "_id" dosen't work.

Working example.
First ....
http://www.mongodb.org/display/DOCS/A+Sample+Configuration+Session

and then ...
$mongo
> use admin
switched to db admin
> db.runCommand( { shardcollection : "test.dexters.chunks", key : { n : 1 } } )
{"collectionsharded" : "test.dexters.chunks" , "ok" : 1}

$vim test_load.py
====================================================
#!/usr/bin/env python
import sys
import os.path
from pymongo.connection import Connection
from gridfs import GridFS

connection = Connection("localhost", 27017)
db = connection["test"]

name = os.path.basename(sys.argv[1])

fs = GridFS(db)
fp = fs.open(name, 'w', 'dexters')

for l in open(sys.argv[1]):
    fp.write(l)
fp.close()
====================================================

There is at leas two sharding strategy, by file_id or by (n, id).
Sharding by file_id is no RAID or RAID-1 (on file level) sharding by n when you use many servers can be like (RAID-0, RAID-10). On collection level performance is always RAID-0, RAID-10 ;-)

Friday, October 16, 2009

mongoDB rocks



I have found this project very useful for big sites and heavy load.

mongodb, comparision.
vim ./test_load.py
======================================
#!/usr/bin/env python
from pymongo.connection import Connection

connection = Connection("localhost", 27017)
db = connection["test"]
for collection in db.collection_names():
print collection

collection = db["testCollection"]

for i in range(10000):
collection.insert({"i":i, "body": 'a - %s'%(i,)})
======================================
$time ./test_load.py
real 0m5.301s
user 0m4.160s
sys 0m0.230s
$du -hs /data/db/
81M /data/db/