• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, and Slack. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.



Page history last edited by Makoto Inoue 12 years, 7 months ago


Use Case Interview 02: Mikio Hirabayashi the Tokyo/Kyoto products author


About Tokyo Cabinet


Q: 一番お気に入りのメソッド、または一番多用するメソッドは何ですか?


Q: What's your favourite method, or most frequently using method?



そもそも私がDBMを開発した理由は全文検索エンジンの転置インデックスを記録するストレージが必要だったからなのです。転置インデックスの位置情報(posting list)を更新するにはputcatは最適なメソッドです。
I do use put/get the most, but my favourite method is putcat, which you can append data at the end of a value.
My original motivation of developing DBM was to create a storage which handles inverted indexes for full text search engine. To update "posting list"(metadata which stores inverted index positions), putcat is the best method.


Q: lua拡張はご自分で(またはお仕事で)どのような用途に使われていますか?


Q: How do you use Lua extension ?


I created Lua extension to manage "footprint" at mixi. "footprint" has data structure of appending visiting user id and its timestamp using putcat.
However, I need a bit more logic than pure "putcat", such as removing old data or leaving the newer timestamp when there are multiple visits per user on the same day. To achieve that, I needed to execute certain application logic at server level. That's why I implemented Lua extension to work as functions.
Having said that, we ended up not using TT for this purpose, because the existing "MySQL + shading" solution was good enough and it was not worth the whole rewrite.
Personally, I do use it to aggregate or summarise small scale log data.


Q: TCはDisk AccessなのにMemory Accessなみに早いというのが特色ですが、それを実現するための一番の秘訣は何ですか?


Q: TC's killer feature is its disk access speed. What's the secret of achieving that?


I just repeated the cycle of "Run profiler, find bottlenecks, and fix them". The most effective technique was to refer to area allocated by mmap directly, rather than reading/writing using pread/pwrite. 


Q: mixi.jp内では最終ログイン時刻ログデータベース、memcachedの代替、search (Tokyo



Q: So you already talked about TC usages such as "last login timestamp recording storage", "memcached alternative", "search (using Tokyo Dystopia)".What else do you use TC for?


I often use it to store intermediate result of data mining, such as link relationship between users, or keyword counts.



Q: 通常のMemcache代替用としてTCを使う際、非常に小さいサイズのデータの格納を想定されていると思います。RavelyというサイトではFragment CacheというHTMLの断片をDiskにキャッシュするのではなく、TCにキャッシュしています(よって数キロ ~ 数十キロバイトのサイズのデータを格納。詳細はhttp://tokyocabinetwiki.pbworks.com/01_CaseyForbesFromRavelry)またそれに関して特にご意見はありますでしょうか?


Q: People usually use memcache to store very small size data, but Ravely stores relatively large size (several kilobytes) into TC by storing fragments of html. What do you think about that?


Fragment Cacheを個々のファイルに分けてディレクトリの中に保存するという方法に比べれば、TCに格納する方が効率的だと思います。ファイルに分ける方法だと毎回のアクセスでディレクトリとファイルをそれぞれopen/closeすることになるので、ファイルシステムの実装によりますが、それなりのオーバーヘッドがかかります。それに比べれば、数キロバイトくらいならTC使う価値はあるでしょう。数メガとかになるとopen/closeよりもデータ自体のread/writeの方が支配的になるので、ファイルシステムを直接使った方が効率的になるでしょう。
It will be effective to store fragmented cache into TC rather than storing individual pages into files. If you store them into files, you have to open and close both the files and their directories, which will cause certain overhead. It makes sense to store up to several kilobytes of data into TC, but I recommend using filesystem to store data much bigger (several megabytes). When the data size becomes several megabytes, the bottleneck is to read/write data itself, rather than open/close the file, and filesystem is faster in this scenario.

Q: Table DBはご自身のWebサイトでTokyo Promenade を使うためのみに使用されているとの認識でよろしいですか?


Q: Do you use Table DB only on your personal website with Tokyo Promenade(CMS)?


確かに開発の動機はTokyo Promenadeを実装するためです。ただ、mixiの裏方の管理業務のシステムを簡単に構築するという目的もあります。
Not quite. I created Table DB to implement Tokyo Promenade. However I also wanted to create some backend admin system for mixi easily.

Q: 本番環境で運用する際にとくに初心者がよく陥りやすい間違いや、特に気をつけるべきことなどありますか?


Q: What's the common mistakes some beginners tend to make, and any advice when running TC?


Make sure you set correct parameter for xmsiz and bnum . Make sure you take backup periodically if you don't want to lose data at system crash. You may want to use transaction where appropriate. 

Q: これから TC/TTに触れてみよう、という方へ一言(特にあれば)


Q: Any message for people who are interested in TC/TT ?


Don't expect TC/TT as more than DBM and its interface.  You may get disappointed if you try to replace RDBMS with TC/TT. Having said that, you may find it useful to be able to persist hash(or associative array), if you are used to think data modelling in simple hash (as in Perl/Ruby/Python/PHP).

About Kyoto Cabinet



Q: Table DB/Kyoto Tyrantの開発予定はありますか?


Q: Any plan to develop Table DB/Kyoto Tyrant ?


Kyoto Tyrantについては未定です。ただ、TCのスケルトンDBを使えばTTにKCを組み込むことは容易ですので、近いうちにそのプロトタイプは作るでしょう。
I do plan to develop TableDB, but it's in lower priority, not within next 6 months.
I do not have plan to create Kyoto Tyrant yet, but it will be easy to include Kyoto Cabinet into Tokyo Tyrant if you use "SkeltonDB" of Tokyo Cabinet.



Q: もしKyoto Tyrantが開発された場合、VisitorパターンへのKTからのアクセスはどのようにされる予定でしょうか?


Q:  How will Kyoto Tyrant support the visitor pattern?


I don't have good ideas yet. Probably via Lua ?



Q: 将来MySQLのように独自のDB Typeを第三者が作成できるようにプラガブルにする予定はありますか?


Q: Do you have any plan to make them pluggable so that other people can create their own storage (like MySQL storage systems).


It's already implemented as such. You can override the behaviour of ProtoDB,  If you define sub class of FileDB class, and pass the instance into the constructor of ProtoDB.



Q: 「TCとは兄弟の関係」とありますが、これはTCを今後も継続して開発していくということでしょうか?それともある時点でKCに一本化するつもりでしょうか?


Q: You said "KC is brother of TC". Does this mean you continue the development of TC, or are you going to switch to KC only at certain point?


I am going to maintain both TC and KC



Q: 現在すでに何らかのプロジェクトに使用していますか?もし今後使うとしたらどういった用途を考えていらっしゃいますか?


Q: Do you already use it for existing projects? If not, what kind of use case you have in mind?


I would like to use KC for multi threaded application.



Q: これから KCに触れてみよう、という方へ一言(特にあれば)


Q: Any message for people who are interested in KC?


Please try Ruby binding first. You will find Visitor pattern useful.



About commuinty


Q: コミュニティから「特にこういった情報を提供してもらえると助かる」といったことはありますか?


Q: What kind of information do you want to have from the community?


Bug report would be my first priority.
I am afraid of feature bloat, so I am less likely to accept new feature requests. That's why I implemented Visitor pattern, so that each user can implement their own requirement.



Q: パッチの提供やバグ報告をする際の決まった手順、ポリシーはありますか?


Q: How would you prefer people to provide patches or bug report?


I am not going to apply patch directly, as it may cause some licensing issues. It would be difficult to change license later if I apply someone else's patch because the product start sharing common license ownership. MySQL and FSF get away with it by forcing contributors to sign contracts to give away their copyright, but that would be a bit too much for me. That's why I tend to check the bug report and fix it by myself.
If you find bugs, I would appreciate if you can provide some test codes so that I can replicate.



Q: パッチの提供やバグ報告をする際、コードがGitHubなどにあったり、バグトラッキングシステムなどがあれば分かりやすいと思うのですが、今後そういったシステムを導入するつもりはありますか?


Q: Do you have any plan to use some bug tracking system(BTS) or host your code on Github.



I don't have any plan to do either. It's mainly because I don't have enough time to deal with all BTS tickets (for now).



Other Topics


Q: tmaesaka氏がBlitzDBというDrizzleの拡張モジュールのストレージとしてTokyo Cabinetを使用しているということですが、もしご存知でしたらもうすこし詳しく教えていただけますか(例;sqlが使えるか、DrizzleにTCを組み込むことによる利点、プロジェクトの進捗状況など)


Q: BlitzDB is one of the  storage engines for Drizzle and it uses Tokyo Cabinet as the underlying storage. Do you know more about the project?


Since BlitzDB is the storage engine for Drizzle, it can use full features of SQL. BlitzDB is aiming for similar use case as MyISAM. It won't have transactions, but it will have high performance. By utilising TC, BlitzDB can benefit its high performance.

Q: ご自身の写真はドラゴンボールのカメハメ波をしているところですか?


Q: What are you doing on your photo ? Is it Kamehameha from Dragonball Z?


No. It's "Zanku Hadou Ken" of "Gouki"(aka Akuma) from Street Fighter.



Comments (0)

You don't have permission to comment on this page.