##################################
# Readme for Parsed FrameNet 1.0 #
##################################

Parsed FrameNet release 1.0 was created on May 20, 2012 from the original
FrameNet Release 1.5 (https://framenet.icsi.berkeley.edu/).

Dependency parses were created with the Stanford Parser version 1.6.8 
(http://nlp.stanford.edu/software/lex-parser.shtml).

Frame annotations (FEEs and frame elements) were aligned to single dependency
nodes in the parse. We selected nodes to maximize token recall of the subtree
under each selected node against the tokens annotated in FrameNet. The data 
set contains the first parse among the best-30 for each sentence that maximizes
the average token recall. 
Details about the conversion process and an evaluation can be found in the
following publication: 

    Daniel Bauer, Hagen Fürstenau, and Owen Rambow. 2012. The Dependency-Parsed
    FrameNet Corpus. In: Proceedings of the 8th Language Resources and 
    Evaluation Conference (LREC 2012). Istanbul, Turkey.

The original version of the data is available at:
http://www1.ccls.columbia.edu/~rambow/resources/parsed-framenet.html

Parsed FrameNet was created by

Daniel Bauer (bauer (at) cs (dot) columbia (dot) E D U), 
Hagen Fürstenau (hagen (at) ccls (dot) columbia (dot) E D U), and
Owen Rambow (rambow (at) ccls (dot) columbia (dot) E D U) 

You may contact the authors with questions.


License
---
Parsed FrameNet 1.0 (c) by Daniel Bauer, Hagen Fürstenau, and Owen Rambow.

Parsed FrameNet is licensed under a
Creative Commons Attribution 3.0 Unported License.

You should have received a copy of the license along with this
work.  If not, see <http://creativecommons.org/licenses/by/3.0/>.


Files
---
The distribution contains two versions of the data, using different dependency 
representations. 

basic_dependencies     - uses the 'basicDependencies' output option of the 
                         Stanford parser. 
collapsed_dependencies - uses the 'CCPropagatedDependencies' output option,
                         which collapses prepositions into dependency edge
                         labels and uses a more semantically inspired 
                         representation for conjunctions. 

For details on the dependency representation see
http://nlp.stanford.edu/software/stanford-dependencies.shtml

Each directory contains two subdirectories

fulltext - contains the FrameNet 1.5 fulltext annotations.
lu       - contains FrameNet lexicographic annotations (single annotation per
           sentence)


File Format 
---

The individual data files use a format that is largely compatible with the
CONLL-2008 format. Sentences are separated by a single line. Each sentence
contains a header (# at the beginning of the line indicates any 
meta-information) and the data itself. 

The header starts with a random identifier (RID), which is unique across the
 whole dataset. 

# RID: 559

RIDs are randomized to allow specification of a random sample as a range of
RIDs (e.g. RID 1-1000).

The header contains an entry for each Frame annotation. The indented rows
list all tokens that have been annotated as part of the frame evoking
element (FEE) or frame element in FrameNet. This information makes it 
possible to recompute the recall score for each annotation span. 

# Frame "Communication"
#     FEE: 4
#     Communicator: 2
#     Message: 6, 7, 8, 9, 10, 11, 12, 13

The final line of the header indicates the average token recall of the
annotated nodes in the parse and the parse selected from the best-30
Stanford parses to maximize this score. 

# Mapped annotation onto parse #1 with average recall of 1.000

The annotated parse itself follows a one-token-per-line format. 
Each line consists of at least 10 fields, separated by tabs.
Empty fields are marked with _ in the data. 

1  - Token ID in surface order, starting at 1 for each sentence
2  - Literal form of the word 
3  - Lemmatized token according to TreeTagger 
4  - POS tag in FrameNet
5  - POS tag according to TreeTagger
6  - 
7  -  6-8 are empty for comptability with the CoNLL 2008 format.
8  - 
9  - Token ID of syntactic heads
10 - grammatical relation to syntactic heads

An even number of subsequent fields contain frame information:

11 - Frame label of the first frame if annotated on this token
12 - Frame elements of the first frame if annotated on this token
13 - Frame label of the second frame ...
14 - Frame elements of the second frame...
etc.

This representation makes it possible for one token to be the FEE or FE of
different frames.

For information on TreeTagger see
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Example annotation:
---

# RID: 559
# Frame "Communication"
#     FEE: 4
#     Communicator: 2
#     Message: 6, 7, 8, 9, 10, 11, 12, 13
# Frame "Opinion"
#     FEE: 8
#     Cognizer: 6
#     Opinion: 9, 10, 11, 12, 13
# Mapped annotation onto parse #1 with average recall of 1.000
1	As	as	in	IN	_	_	_	4	mark	_	_	_	_
2	someone	someone	nn	NN	_	_	_	4	nsubj	_	Communicator	_	_
3	has	have	VHZ	VBZ	_	_	_	4	aux	_	_	_	_
4	said	say	VVD	VBD	_	_	_	8	advcl	Communication	_	_	_
5	,	,	,	,	_	_	_	_	_	_	_	_	_
6	Columbus	Columbus	NP	NNP	_	_	_	8	nsubj	_	_	_	Cognizer
7	only	only	rb	RB	_	_	_	8	advmod	_	_	_	_
8	thought	think	VVD	VBD	_	_	_	_	_	_	Message	Opinion	_
9	that	IN	in	IN	_	_	_	12	complm	_	_	_	_
10	he	he	PP	PRP	_	_	_	12	nsubj	_	_	_	_
11	had	have	VHD	VBD	_	_	_	12	aux	_	_	_	_
12	discovered	discover	VVN	VBN	_	_	_	8	ccomp	_	_	_	Opinion
13	Jamaica	Jamaica	NP	NNP	_	_	_	12	dobj	_	_	_	_
14	.	.	sent	.	_	_	_	_	_	_	_	_	_