Drury, B.M and Morais Drury, S (2021) BrAgriSpeech: A Corpus of Brazilian-Portuguese Agricultural Reported Speech. In: Text Speech and Dialogue (TSD), 6/09/2021 - 9/09/2021, Olomouc.
Text
BrAgriSpeech.pdf - Accepted Version Restricted to Repository staff only Download (326kB) | Request a copy |
Abstract
Agriculture is one of Brazil's largest industries. In Brazil, the price of crops such as sugarcane is driven not only by the production levels but also by speculation and rumour. Also, some crop derivatives such as ethanol have their prices regulated by the government. Reported comments from influential speakers such as government ministers and agricultural-business leaders can impact the prices and in some cases the level of production of food products. Currently, there are no corpora in Brazilian-Portuguese that contains agricultural-related speech, the speakers and their employer. BrAgriSpeech is a corpus that uses linguistic rules and pre-trained models to extract reported speech, the speaker and where available the speaker's employer as well as a discourse connector that connects the speaker with the quote. The resource has 6982 quotes which are in JSONL format. A sample of 50 quotes was manually evaluated and had an accuracy of 0.77 for quote identification, 0.82 for the identification of the speaker and 0.87 for the identification of the discourse connector. The resource is publicly available to encourage further research in the area.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information and Comments: | The final publication is available from: https://link.springer.com/chapter/10.1007/978-3-030-83527-9_34 |
Keywords: | corpus agriculture text mining speech discourse |
Faculty / Department: | Faculty of Human and Digital Sciences > Mathematics and Computer Science |
Depositing User: | Brett Drury |
Date Deposited: | 05 Nov 2021 13:35 |
Last Modified: | 05 Nov 2021 13:35 |
URI: | https://hira.hope.ac.uk/id/eprint/3408 |
Actions (login required)
View Item |