WSDM Cup Ranker Challenge
|August 5 2015||Rules published|
|August 7 2015||Registration site open|
|August 28 2015||Phase 1 submissions open|
|October 30 2015||Registrations close|
|November 6 2015||Team membership disclosure deadline|
|November 14 2015||Phase 1 closes @ 11:59pm PST *updated|
|November 16 2015||Finalists announced and notified|
|December 1 2015||Phase 2 submissions due @ 11:59pm PST|
|January 31 2016||Phase 2 evaluation closes|
|February 22 2016||WSDM Cup Workshop|
The goal of the Ranker Challenge (the “Challenge”) is to assess the query-independent importance of scholarly articles, using data from the Microsoft Academic Graph--a large heterogeneous graph comprised of publications, authors, venues, organizations, and the fields of study. The goal of this ranking challenge is to provide the best static rank values (as defined in http://en.wikipedia.org/wiki/Learning_to_rank or http://www2006.org/programme/files/xhtml/3101/p3101-Richardson.html) for each of publication entity in a heterogeneous graph.
Static rank plays a key role in recommendation systems, especially in the cold start scenarios, and also for search engines to determine the ranking of search results (e.g., for queries like “papers by author x”, “papers about topic y”). Traditional metrics have relied heavily on citations, which favor the more established, seminal papers and treat all citations as equal (and positive) indicators of importance and impact. We invite the community to jointly explore and develop better alternatives in this challenge.
The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between publications, as well as authors, institutions, journal and conference "venues," and fields of study. This data is available as a set of zipped text files stored in Microsoft Azure blob storage and available via HTTP. The file size (zipped) is ~37GB.
We encourage researchers working with this data to apply for an Azure for Research Award. Details are available on the Azure for Research award submission process at http://azure4research.com. Please include the hashtag #academicgraph in your submission title for easier tracking.
The data provided for the challenge may be access and downloaded from http://aka.ms/academicgraph
Note: In order to emphasize one important technical challenge that is common in web-scale data collection and aggregation, the data released here have undergone only rudimentary processing, for example in areas of author and paper conflation/deduplication. This noisy yet realistic dataset can provide additional avenues for research in the big data arena.
Additionally, the Microsoft Academic Graph contains a simple rank score for each paper in the graph. The score reflects the quantized natural log of the importance of a paper based on the number and quality of papers citing it.
External Data sources
Teams are welcome to use external data in their approaches as long as that data is publically accessible. All teams participating in Phase 2 will be asked to document all external data sources, and detail how the data was used. See Phase 2 below.
Registration for the WSDM Cup will open on August 7 2015 and will remain open until October 30 2015. To register a team is required to first sign in with a Microsoft Account and then specify a unique team name (the team name will be used as a unique identifier and should not contain any spaces or non-alphanumeric characters).
After registering the team will receive a link to the evaluation dataset as well as a limited-access token to an Azure storage container where phase 1 submissions will be uploaded. One week before the close of Phase 1 (November 6 2015), each team will also be required to disclose their team membership, including the full name, affiliation, and email address of each member.
There is no limit to the number of people on any given team. Individuals may be a member of multiple teams (not to exceed 3), but no two teams may have exactly the same members or team name. If two teams have the same membership, both teams will be disqualified.
The evaluation will be conducted in two phases.
During Phase 1, submissions (see details below) will be scored based on the agreements with human judgement data. A group of Computer Science researchers are invited by the organizers to conduct pairwise ranking of papers in the fields they actively conduct research. The pairwise judgement data will then be randomly segregated into an Evaluation and a Test set. Submissions during Phase 1 will be automatically scored against the Evaluation set and added to a public leaderboard that is sorted based on the percentage of agreements with the judgment data.
Teams may submit trials as often as they wish during this phase. The leaderboard will be publically available at http://aka.ms/wsdmcup2016 and show the rank of each team based on the relative score of their most recent submissions score.
One week before the close of Phase 1, each team must identify the name, affiliation, and email address of each team member. Failure to do so will result in the team being disqualified. Submitting the team member composition implies a commitment to author a paper to be considered for presentation in a corresponding workshop for the WSDM Cup.
At the close of Phase 1, the website will stop accepting new submissions. The most recent submission from each team will be evaluated against the Test set and the scores, ranked by the percentage agreements with the Test set, will be announced to the leaderboard. The top eight teams on the leaderboard at the close of Phase 1 will be invited to participate in Phase 2 and present their work in the WSDM workshop at the WSDM 2016 Conference to be held from February 23-25 2016 in San Francisco CA. Participants are solely responsible for arranging and paying for any travel required to attend the Conference. Workshop papers from authors who do not participate in WSDM Cup will be peer-reviewed and considered.
Each finalist team participating in Phase 2 will be provided with access to an updated graph and will be asked to re-run their algorithms over this new graph. A new access token will be provided to each of the finalist teams, and the final rank values will be submitted to Azure storage. Each team must also submit a workshop paper outlining their approach, details of their algorithm, results, and information about any external data sources used.
Phase 2 of the Challenge will be conducted by Microsoft Research in coordination with Bing. Each of the finalist datasets will be applied to as public facing flights in Bing search results and will power the ranker used by Bing for academic queries. Basic statistics of Phase 2 evaluations will be presented at the workshop by the organizers. Because the Phase 2 evaluation metrics will remain proprietary, this WSDM Cup will officially acknowledge winners based on the Phase 1 evaluation results, with the ranking from Phrase 2 provided as a public reference.
Each challenge submission will be a single .tsv text file that contains a row for each paper in the graph and a rank value following format:
[paper id] \t [probability score] \n
The paper ID must match the format given in the graph data, an 8 digit hexadecimal number. The probability score is a decimal value between 0 and 1 representing the importance of a paper in relation to other papers in the graph. If a paper is missing from the results, the value will be assumed as 0.
The filename must be “results.tsv”.
Results will be submitted through a central Azure storage account setup for the WSDM cup challenge at https://wsdmcupchallenge.blob.core.windows.net
Each team will be given a container on the storage account, e.g. https://wsdmcupchallenge.blob.core.windows.net/team-x/
Teams can upload their results to the container as frequently as they like. Access will be controlled via a Shared Access Signature (SAS) that is provided during team registration (see Bing for examples on how to do this).
You may participate in the Challenge as an individual or part of a team. However, teams register via the registration website prior to October 30th, and must finalize the team composition prior to November 6th 2015. Although you can be a member of more than one team (not to exceed three), no two teams can have exactly the same members or team name. Each team must select one individual as the corresponding author for the challenge.