Accessing these traces requires an account on the 'obelix' cluster in the department. If you do not have an account already, contact Tyler Trafford <trafford.cs.umass.edu> in order to obtain one.
Notes from Emmanuel Cecchet about this dataset are below.
The data is on bigbackup:/srv/backup2/wikibench also mounted on
cecchet@obelix.cs.umass.edu:/nfs/bigbackup/wikibench/traces>ls
2009-10 2010-01 2010-03 2010-06 filter_en_wikibooks.sh index.html?C=D;O=D index.html?C=N;O=A index.html?C=S;O=D
2009-11 2010-02 2010-04 2010-07 index.html index.html?C=M;O=A index.html?C=N;O=D mysql
2009-12 2010-02.03-wikibooks 2010-05 2010-08 index.html?C=D;O=A index.html?C=M;O=D index.html?C=S;O=A wikibooks
There is one directory per month.
A month of data is approximately 250GB compressed
cecchet@obelix.cs.umass.edu:/nfs/bigbackup/wikibench/traces>du --si 2010-05
257G 2010-05
The compression factor is about 6, so expect 1 month of traces to be about 1.5TB.
cecchet@obelix.cs.umass.edu:/nfs/bigbackup/wikibench/traces/2010-05>ls -lh wiki.1274957416.gz
-rw-r--r-- 1 cecchet lass 63M May 27 2010 wiki.1274957416.gz
cecchet@obelix.cs.umass.edu:/nfs/bigbackup/wikibench/traces/2010-05>ls -lh wiki.1274957416
-rw-r--r-- 1 cecchet lass 376M May 27 2010 wiki.1274957416
Each line in the log consists of:
- a unique request id,
- a timestamp,
- the URL being accessed (there is no information on origin though you can sometimes find more info embedded in URLs).
154132716 2010-05-27T09:31:55.63 http://en.m.wikipedia.org/wiki/Judge_Me_Tender -
The traces come originally from Guillaume Pierre in Amsterdam. They published that paper about the workload: http://dl.acm.org/citation.cfm?id=1551224
Here is what I figured out so far from the URLs:
Wikibooks traces
List of parameters to PHP scripts described at http://www.mediawiki.org/wiki/Manual:Parameters_to_index.php
- Wiki page access (GET):
- http://en.wikibooks.org/wiki/NameOfThePage (this is what the user has clicked on)
- Show random book (read interaction):
- http://en.wikibooks.org/w/api.php?action=query&indexpageids=1&generator=random&grnnamespace=0%7C110&grnlimit=10&prop=categories&cllimit=100&format=json&callback=showRandBookCB&requestid=rb4
- Rendering (cmd=rendering):
- http://en.wikibooks.org/w/index.php?title=Special:Book&bookcmd=rendering&return_to=Special%3ABook&collection_id=2d1a78448da71d72&writer=rl
- Write and Special actions:
- POST
- Edit a page:
http://en.wikibooks.org/w/index.php?title=Oracle_Programming/10g_Advanced_SQL&action=edit§ion=3
- Submit (when edit is done):
http://en.wikibooks.org/w/index.php?title=A-level_Biology/Human_Health_and_Disease/infectious_diseases&action=submit
- RSS feeds (subscription to a RSS feed):
- http://en.wikibooks.org/w/index.php?title=Special:RecentChanges&feed=atom
- http://en.wikibooks.org/w/index.php?title=Special:RecentChanges&feed=rss
- Search (when &suggest is appended, these are requests automatically generated by the web browser each time the user hit a key in the search field, it searches for
suggestions in the search field)
- http://en.wikibooks.org/w/api.php?action=opensearch&search=carcinoi&namespace=0%7C4%7C112&suggest
- Login:
- http://en.wikibooks.org/w/index.php?title=Special:UserLogin&type=signup
- From a page:
- http://en.wikibooks.org/w/index.php?title=Special:UserLogin&returnto=Help:Collections
- http://en.wikibooks.org/w/index.php?title=Special:UserLogin&action=submitlogin&type=login&returnto=Help:Collections
- Logout:
- http://en.wikibooks.org/w/index.php?title=Special:UserLogou&returnto=Main_Page
- Versioning (requires revisions of pages)
- http://en.wikibooks.org/w/index.php?title=Na%27vi/Verbs&diff=1720852&oldid=prev
- http://en.wikibooks.org/w/index.php?diff=1725464&oldid=1725463&rcid=1733365&diffonly=1&action=render
- http://192.168.245.200/w/index.php?title=Talk:French&action=history
- Ignored interactions:
- Images (downloaded as part of the page):
- http://upload.wikimedia.org/wikibooks/en/1/18/CompRad1.jpg actually located in http://IP/images/1/18/CompRad1.jpg
- http://upload.wikimedia.org/wikibooks/en/thumb/0/00/Nonlinear_separable.JPG/300px-Nonlinear_separable.JPG
- CentralAuth (http://www.mediawiki.org/wiki/Extension:CentralAuth) for authentication among multiple wikis (shared accounts):
- http://en.wikibooks.org/w/api.php?action=query&meta=globaluserinfo&guiprop=merged%7Cunattached&format=json&guiuser=Karmine201
- Talk?
- http://en.wikibooks.org/w/api.php?inprop=protection%7Ctalkid%7Csubjectid%7Curl%7Creadable&format=json&rvprop=content%7Cids%7Cflags%7Ctimestamp%7Cuser%7Ccomment%7Csize&prop=revisions%7Cinfo&titles=User%20talk%3ALaleena&rvlimit=1&action=query
- Redirects:
- http://en.wikibooks.org/w/api.php?redirects=1&tllimit=500&format=json&rvprop=ids%7Ccontent%7Ctimestamp%7Cuser&prop=revisions%7Ccategories&titles=General+Chemistry%2FThermodynamics%2FThe+First+Law+of+Thermodynamics%7CGeneral+Chemistry%2FSolubility%7CTemplate:Element+color%2FAlkaline+earth+metals%7CTemplate:Element+color%2FAlkaline+earth+metals%2FPrint%7CGeneral+Chemistry%2FIntroduction%7CGeneral+Chemistry%2FThermodynamics%2FIntroduction%7CGeneral+Chemistry%2FChemical+Equilibria%2FLe+Chatelier%27s+Principle%7CGeneral+Chemistry%2FChemical+Equilibria%2FEquilibrium%7CTemplate:General+Chemistry%2FNavigation%7CTemplate:General+Chemistry%2FNavigation%2FPrint%7CGeneral+Chemistry%2FReaction+Mechanisms%7CGeneral+Chemistry%2FChemical+Equilibria%2FSolutions+in+Equilibrium%7CGeneral+Chemistry%2FThermodynamics%2FThe+Second+Law+of+Thermodynamics%7CGeneral+Chemistry%2FBook+Cover%7CGeneral+Chemistry%2FChe!
mistries+of+Various+Elements%2FGroup+2&action=query&imlimit=500
|