Showing posts with label problem. Show all posts
Showing posts with label problem. Show all posts

Tuesday, October 20, 2009

mongoDB gridfs and sharding

If you want use gridfs and sharding chunks. In example is the mistake. Sharding by "_id" dosen't work.

Working example.
First ....
http://www.mongodb.org/display/DOCS/A+Sample+Configuration+Session

and then ...
$mongo
> use admin
switched to db admin
> db.runCommand( { shardcollection : "test.dexters.chunks", key : { n : 1 } } )
{"collectionsharded" : "test.dexters.chunks" , "ok" : 1}

$vim test_load.py
====================================================
#!/usr/bin/env python
import sys
import os.path
from pymongo.connection import Connection
from gridfs import GridFS

connection = Connection("localhost", 27017)
db = connection["test"]

name = os.path.basename(sys.argv[1])

fs = GridFS(db)
fp = fs.open(name, 'w', 'dexters')

for l in open(sys.argv[1]):
    fp.write(l)
fp.close()
====================================================

There is at leas two sharding strategy, by file_id or by (n, id).
Sharding by file_id is no RAID or RAID-1 (on file level) sharding by n when you use many servers can be like (RAID-0, RAID-10). On collection level performance is always RAID-0, RAID-10 ;-)