################################## # Readme for Parsed FrameNet 1.0 # ################################## Parsed FrameNet release 1.0 was created on May 20, 2012 from the original FrameNet Release 1.5 (https://framenet.icsi.berkeley.edu/). Dependency parses were created with the Stanford Parser version 1.6.8 (http://nlp.stanford.edu/software/lex-parser.shtml). Frame annotations (FEEs and frame elements) were aligned to single dependency nodes in the parse. We selected nodes to maximize token recall of the subtree under each selected node against the tokens annotated in FrameNet. The data set contains the first parse among the best-30 for each sentence that maximizes the average token recall. Details about the conversion process and an evaluation can be found in the following publication: Daniel Bauer, Hagen Fürstenau, and Owen Rambow. 2012. The Dependency-Parsed FrameNet Corpus. In: Proceedings of the 8th Language Resources and Evaluation Conference (LREC 2012). Istanbul, Turkey. The original version of the data is available at: http://www1.ccls.columbia.edu/~rambow/resources/parsed-framenet.html Parsed FrameNet was created by Daniel Bauer (bauer (at) cs (dot) columbia (dot) E D U), Hagen Fürstenau (hagen (at) ccls (dot) columbia (dot) E D U), and Owen Rambow (rambow (at) ccls (dot) columbia (dot) E D U) You may contact the authors with questions. License --- Parsed FrameNet 1.0 (c) by Daniel Bauer, Hagen Fürstenau, and Owen Rambow. Parsed FrameNet is licensed under a Creative Commons Attribution 3.0 Unported License. You should have received a copy of the license along with this work. If not, see . Files --- The distribution contains two versions of the data, using different dependency representations. basic_dependencies - uses the 'basicDependencies' output option of the Stanford parser. collapsed_dependencies - uses the 'CCPropagatedDependencies' output option, which collapses prepositions into dependency edge labels and uses a more semantically inspired representation for conjunctions. For details on the dependency representation see http://nlp.stanford.edu/software/stanford-dependencies.shtml Each directory contains two subdirectories fulltext - contains the FrameNet 1.5 fulltext annotations. lu - contains FrameNet lexicographic annotations (single annotation per sentence) File Format --- The individual data files use a format that is largely compatible with the CONLL-2008 format. Sentences are separated by a single line. Each sentence contains a header (# at the beginning of the line indicates any meta-information) and the data itself. The header starts with a random identifier (RID), which is unique across the whole dataset. # RID: 559 RIDs are randomized to allow specification of a random sample as a range of RIDs (e.g. RID 1-1000). The header contains an entry for each Frame annotation. The indented rows list all tokens that have been annotated as part of the frame evoking element (FEE) or frame element in FrameNet. This information makes it possible to recompute the recall score for each annotation span. # Frame "Communication" # FEE: 4 # Communicator: 2 # Message: 6, 7, 8, 9, 10, 11, 12, 13 The final line of the header indicates the average token recall of the annotated nodes in the parse and the parse selected from the best-30 Stanford parses to maximize this score. # Mapped annotation onto parse #1 with average recall of 1.000 The annotated parse itself follows a one-token-per-line format. Each line consists of at least 10 fields, separated by tabs. Empty fields are marked with _ in the data. 1 - Token ID in surface order, starting at 1 for each sentence 2 - Literal form of the word 3 - Lemmatized token according to TreeTagger 4 - POS tag in FrameNet 5 - POS tag according to TreeTagger 6 - 7 - 6-8 are empty for comptability with the CoNLL 2008 format. 8 - 9 - Token ID of syntactic heads 10 - grammatical relation to syntactic heads An even number of subsequent fields contain frame information: 11 - Frame label of the first frame if annotated on this token 12 - Frame elements of the first frame if annotated on this token 13 - Frame label of the second frame ... 14 - Frame elements of the second frame... etc. This representation makes it possible for one token to be the FEE or FE of different frames. For information on TreeTagger see http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ Example annotation: --- # RID: 559 # Frame "Communication" # FEE: 4 # Communicator: 2 # Message: 6, 7, 8, 9, 10, 11, 12, 13 # Frame "Opinion" # FEE: 8 # Cognizer: 6 # Opinion: 9, 10, 11, 12, 13 # Mapped annotation onto parse #1 with average recall of 1.000 1 As as in IN _ _ _ 4 mark _ _ _ _ 2 someone someone nn NN _ _ _ 4 nsubj _ Communicator _ _ 3 has have VHZ VBZ _ _ _ 4 aux _ _ _ _ 4 said say VVD VBD _ _ _ 8 advcl Communication _ _ _ 5 , , , , _ _ _ _ _ _ _ _ _ 6 Columbus Columbus NP NNP _ _ _ 8 nsubj _ _ _ Cognizer 7 only only rb RB _ _ _ 8 advmod _ _ _ _ 8 thought think VVD VBD _ _ _ _ _ _ Message Opinion _ 9 that IN in IN _ _ _ 12 complm _ _ _ _ 10 he he PP PRP _ _ _ 12 nsubj _ _ _ _ 11 had have VHD VBD _ _ _ 12 aux _ _ _ _ 12 discovered discover VVN VBN _ _ _ 8 ccomp _ _ _ Opinion 13 Jamaica Jamaica NP NNP _ _ _ 12 dobj _ _ _ _ 14 . . sent . _ _ _ _ _ _ _ _ _