Introduction

Sparv is Språkbanken’s annotation tool and contains a corpus import pipeline and a web service including a web interface. Sparv’s API is available at https://ws.spraakbanken.gu.se/ws/sparv/v2/.

This documentation is for API version 2.

Queries for Annotating Texts

Queries to the web service can be sent as a simple GET request:

https://ws.spraakbanken.gu.se/ws/sparv/v2/?text=En+exempelmening+till+nättjänsten

POST requests are also supported using the same address. This can be useful for longer texts. Here is an example using curl:

curl -X POST --data-binary text="En exempelmening till nättjänsten" https://ws.spraakbanken.gu.se/ws/sparv/v2/

The response from the POST request is the same as for the above GET request. See default query for more details.

It is also possible to upload text or XML files using curl:

curl -X POST -F files[]=@/path/to/file/myfile.txt https://ws.spraakbanken.gu.se/ws/sparv/v2/upload?

In this case the response is a download link to a zip file containing the annotation.

Settings

The web service supports some custom settings, e.g. it lets you chose between different tokenizers on word, sentence, and paragraph level and you can define which annotations should be generated. Via the settings you can also chose the language of your input and you can define whether your input is in xml or plain text.

These settings are provided as a JSON object to the settings variable. This object must satisfy the JSON schema available at the following adress:

https://ws.spraakbanken.gu.se/ws/sparv/v2/schema?language=sv&mode=plain

The schema holds default values for all the attributes. The use of the settings variable is therefore optional.

A request which only generates a dependency analysis could look like this:

https://ws.spraakbanken.gu.se/ws/sparv/v2/?text=Det+trodde+jag+aldrig.&settings={"positional_attributes":{"dependency_attributes":["ref","dephead","deprel"],"lexical_attributes":[],"compound_attributes":[]}}

Or as a POST request using curl:

curl -X POST -g --data-binary text="Det trodde jag aldrig." 'https://ws.spraakbanken.gu.se/ws/sparv/v2/?settings={"positional_attributes":{"dependency_attributes":["ref","dephead","deprel"],"lexical_attributes":[],"compound_attributes":[]}}'

If you are not sure how to define the settings variable you can check the example settings or you can get help from the frontend by clicking Show JSON Settings under Show advanced settings. This will generate the JSON object for the chosen settings which is sent in the settings variable.

The makefile which is generated for a certain set of settings can be viewed by sending a makefile query:

https://ws.spraakbanken.gu.se/ws/sparv/v2/makefile?settings={"positional_attributes":{"dependency_attributes":["ref","dephead","deprel"],"lexical_attributes":[],"compound_attributes":[]}}

Joining a Build

At the top of the XML response you can find a hash number inside the build-tag. This hash can be used to join an earlier build.

The following request is used for joining the build from the first example of this documentation:

https://ws.spraakbanken.gu.se/ws/sparv/v2/join?hash=57fce7e430c7ab4dd83d5244b566dade92595db2

The response contains the chosen settings, the original text and of course the result of the annotation. See join for more details.

Analysing other Languages

The default analysis language is Swedish but Sparv also supports other languages. The language is specified by supplying a two-letter language code to the language parameter.

This is an example for an analysis of a German sentence:

https://ws.spraakbanken.gu.se/ws/sparv/v2/?text=Nun+folgt+ein+deutscher+Beispielsatz.&language=de

The following table shows the languages that are currently supported and the tools that are used to analyse them:

Language	Code	Analysis Tool
Bulgarian	bg	TreeTagger
Catalan	ca	FreeLing
Dutch	nl	TreeTagger
Estonian	et	TreeTagger
English	en	FreeLing
French	fr	FreeLing
Finnish	fi	TreeTagger
Galician	gl	FreeLing
German	de	FreeLing
Italian	it	FreeLing
Latin	la	TreeTagger
Norwegian	no	FreeLing
Polish	pl	TreeTagger
Portuguese	pt	FreeLing
Romanian	ro	TreeTagger
Russian	ru	FreeLing
Slovak	sk	TreeTagger
Slovenian	sl	FreeLing
Spanish	es	FreeLing

Different kinds of settings are supported for different languages, depending on which tool is used for the analysis. Please use the frontend if you want to check which options there are for a certain language. Alternatively you can check the JSON schema for the language you want to analyse by sending a schema request, e.g.:

https://ws.spraakbanken.gu.se/ws/sparv/v2/schema?language=de

Progress Information

By adding the flag incremental=true to your usual query you can receive more information on how your analysis is being processed. An example query could look like this:

https://ws.spraakbanken.gu.se/ws/sparv/v2/?text=Nu+med+inkrementell+information&incremental=true

The resulting XML will contain the following extra tags:

<increment command="" step="0" steps="27"/>
<increment command="sb.segment" step="1" steps="27"/>
<increment command="sb.segment" step="2" steps="27"/>
<increment command="sb.number --position" step="3" steps="27"/>
<increment command="sb.segment" step="4" steps="27"/>
<increment command="sb.annotate --span_as_value" step="5" steps="27"/>
<increment command="sb.annotate --text_spans" step="6" steps="27"/>
<increment command="sb.parent --children" step="7" steps="27"/>
<increment command="sb.parent --parents" step="8" steps="27"/>
<increment command="sb.parent --parents" step="9" steps="27"/>
<increment command="sb.parent --parents" step="10" steps="27"/>
<increment command="sb.number --position" step="11" steps="27"/>
<increment command="sb.hunpos" step="12" steps="27"/>
<increment command="sb.number --relative" step="13" steps="27"/>
<increment command="sb.annotate --select" step="14" steps="27"/>
<increment command="sb.saldo" step="15" steps="27"/>
<increment command="sb.readability --lix" step="16" steps="27"/>
<increment command="sb.readability --ovix" step="17" steps="27"/>
<increment command="sb.readability --nominal_ratio" step="18" steps="27"/>
<increment command="sb.compound" step="19" steps="27"/>
<increment command="sb.wsd" step="20" steps="27"/>
<increment command="sb.malt" step="21" steps="27"/>
<increment command="sb.sentiment" step="22" steps="27"/>
<increment command="sb.annotate --select" step="23" steps="27"/>
<increment command="sb.annotate --select" step="24" steps="27"/>
<increment command="sb.annotate --chain" step="25" steps="27"/>
<increment command="sb.cwb --export" step="26" steps="27"/>
<increment command="" step="27" steps="27"/>

Note that this information will only be displayed if your query is run for the first time. The progress information is not available for older builds.

Available calls

api

Shows this API documentation.

methods: GET
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/api

default query

When provided with the text parameter this call handles text input and runs the Sparv analysis.

methods: GET, POST
parameters:
- text (required)
- settings, default: settings returned from https://ws.spraakbanken.gu.se/ws/sparv/v2/schema
- language, default: sv
- mode, default: plain, options: plain, xml, file
- incremental, default: False
examples:
- https://ws.spraakbanken.gu.se/ws/sparv/v2/?text=En+exempelmening+till+nättjänsten
- curl -X POST --data-binary text="En exempelmening till nättjänsten" https://ws.spraakbanken.gu.se/ws/sparv/v2/
result:

<result>
  <build hash="57fce7e430c7ab4dd83d5244b566dade92595db2"/>
  <corpus link="https://ws.spraakbanken.gu.se/ws/sparv/v2/download?hash=57fce7e430c7ab4dd83d5244b566dade92595db2">
    <text lix="54.00" ovix="inf" nk="inf">
      <paragraph>
        <sentence id="8f7-84d">
          <w pos="DT" msd="DT.UTR.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" complemgram="|" compwf="|" sentiment="0.6799" ref="1" dephead="2" deprel="DT">En</w>
          <w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|exempelmening|" lex="|" sense="|" complemgram="|exempel..nn.1+mening..nn.1:1.309e-08|" compwf="|exempel+mening|" ref="2" deprel="ROOT">exempelmening</w>
          <w pos="PP" msd="PP" lemma="|till|" lex="|till..pp.1|" sense="|till..1:-1.000|" complemgram="|" compwf="|" sentiment="0.5086" ref="3" dephead="2" deprel="ET">till</w>
          <w pos="NN" msd="NN.UTR.SIN.DEF.NOM" lemma="|nättjänst|nättjänsten|" lex="|" sense="|" complemgram="|nät..nn.1+tjänst..nn.2:6.298e-11|nät..nn.1+tjänst..nn.1:6.298e-11|nätt..av.1+tjänst..nn.1:1.140e-12|nätt..av.1+tjänst..nn.2:1.140e-12|nät..nn.1+tjäna..vb.1+sten..nn.2:2.303e-27|nät..nn.1+tjäna..vb.1+sten..nn.1:2.303e-27|nätt..av.1+tjäna..vb.1+sten..nn.1:5.537e-28|nätt..av.1+tjäna..vb.1+sten..nn.2:5.537e-28|" compwf="|nät+tjänsten|nätt+tjänsten|nät+tjän+sten|nätt+tjän+sten|" ref="4" dephead="3" deprel="PA">nättjänsten</w>
        </sentence>
      </paragraph>
    </text>
  </corpus>
</result>

ping

Pings the backend, responds with the status of the catapult.

methods: GET
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/ping
result:

<catapult time="0.0063">PONG</catapult>

schema

Returns the json schema generated from the provided parameters.

methods: GET
parameters:
- language, default: sv
- mode, default: plain, other options: xml, file
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/schema?language=sv&mode=plain
result: json schema for the given language and text mode

makefile

Returns the Makefile generated from the provided parameters.

methods: GET, POST
parameters:
- language, default: sv
- mode, default: plain, other options: xml, file
- settings
- incremental, default: False
examples:
- https://ws.spraakbanken.gu.se/ws/sparv/v2/makefile?settings={"positional_attributes":{"dependency_attributes":["ref","dephead","deprel"],"lexical_attributes":[],"compound_attributes":[]}}
- curl -g -X POST 'https://ws.spraakbanken.gu.se/ws/sparv/v2/makefile?settings={"positional_attributes":{"dependency_attributes":["ref","dephead","deprel"],"lexical_attributes":[],"compound_attributes":[]}}'
result:

include $(SPARV_MAKEFILES)/Makefile.config
corpus = untitled
original_dir = original

vrt_columns_annotations = word ref dephead.ref deprel
vrt_columns             = word ref dephead     deprel

vrt_structs_annotations = sentence.id paragraph.n text text.lix text.ovix text.nk
vrt_structs             = sentence:id paragraph   text text:lix text:ovix text:nk

xml_elements    = text
xml_annotations = text

token_chunk = sentence
token_segmenter = better_word

sentence_chunk = paragraph
sentence_segmenter = punkt_sentence

paragraph_chunk = text
paragraph_segmenter = blanklines

include $(SPARV_MAKEFILES)/Makefile.rules

upload

Handles file uploads and runs the analysis.

methods: POST
parameters:
- language, default: sv
- mode, default: plain, other options: xml, file
- email
- files
- settings
example: curl -X POST -F files[]=@/path/to/file/myfile.txt https://ws.spraakbanken.gu.se/ws/sparv/v2/upload?
result: a download link to a zip file containing the annotation

download

Handles download of result files.

methods: GET
parameters:
- hash (required)
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/download?hash=a0c3861b251a595c83859c6cf4c595e8c71ad8da-f
result: a zip file containing the annotation

join

Joins an existing build.

methods: GET, POST
parameters:
- hashnumber (required)
- language, default: sv
- mode, default: plain, other options: xml, file
- incremental, default: False
examples:
- https://ws.spraakbanken.gu.se/ws/sparv/v2/join?hash=57fce7e430c7ab4dd83d5244b566dade92595db2
- curl -X POST 'https://ws.spraakbanken.gu.se/ws/sparv/v2/join?hash=57fce7e430c7ab4dd83d5244b566dade92595db2'
result:

<result>
    <settings>{
      "root": {
        "attributes": [],
        "tag": "text"
      },
      "text_attributes": {
        "readability_metrics": [
          "lix",
          "ovix",
          "nk"
        ]
      },
      "word_segmenter": "default_tokenizer",
      "positional_attributes": {
        "lexical_attributes": [
          "pos",
          "msd",
          "lemma",
          "lex",
          "sense"
        ],
        "compound_attributes": [
          "complemgram",
          "compwf"
        ],
        "dependency_attributes": [
          "ref",
          "dephead",
          "deprel"
        ],
        "sentiment": [
          "sentiment"
        ]
      },
      "sentence_segmentation": {
        "sentence_chunk": "paragraph",
        "sentence_segmenter": "default_tokenizer"
      },
      "paragraph_segmentation": {
        "paragraph_segmenter": "blanklines"
      },
      "lang": "sv",
      "textmode": "plain",
      "named_entity_recognition": [],
      "corpus": "untitled"
      }
      </settings>
      <original>&lt;text&gt;En exempelmening till nättjänsten&lt;/text&gt;</original>
      <build hash='57fce7e430c7ab4dd83d5244b566dade92595db2'/>
      <corpus link='https://ws.spraakbanken.gu.se/ws/sparv/v2/download?hash=57fce7e430c7ab4dd83d5244b566dade92595db2'>
        <text lix="54.00" ovix="inf" nk="inf">
          <paragraph>
            <sentence id="8f7-84d">
              <w pos="DT" msd="DT.UTR.SIN.IND" lemma="|en|" lex="|en..al.1|" sense="|den..1:-1.000|en..2:-1.000|" complemgram="|" compwf="|" sentiment="0.6799" ref="1" dephead="2" deprel="DT">En</w>
              <w pos="NN" msd="NN.UTR.SIN.IND.NOM" lemma="|exempelmening|" lex="|" sense="|" complemgram="|exempel..nn.1+mening..nn.1:1.309e-08|" compwf="|exempel+mening|" ref="2" deprel="ROOT">exempelmening</w>
              <w pos="PP" msd="PP" lemma="|till|" lex="|till..pp.1|" sense="|till..1:-1.000|" complemgram="|" compwf="|" sentiment="0.5086" ref="3" dephead="2" deprel="ET">till</w>
              <w pos="NN" msd="NN.UTR.SIN.DEF.NOM" lemma="|nättjänst|nättjänsten|" lex="|" sense="|" complemgram="|nät..nn.1+tjänst..nn.2:6.298e-11|nät..nn.1+tjänst..nn.1:6.298e-11|nätt..av.1+tjänst..nn.1:1.140e-12|nätt..av.1+tjänst..nn.2:1.140e-12|nät..nn.1+tjäna..vb.1+sten..nn.2:2.303e-27|nät..nn.1+tjäna..vb.1+sten..nn.1:2.303e-27|nätt..av.1+tjäna..vb.1+sten..nn.1:5.537e-28|nätt..av.1+tjäna..vb.1+sten..nn.2:5.537e-28|" compwf="|nät+tjänsten|nätt+tjänsten|nät+tjän+sten|nätt+tjän+sten|" ref="4" dephead="3" deprel="PA">nättjänsten</w>
            </sentence>
          </paragraph>
        </text>
      </corpus>
      </result>

status

Returns the status of existing builds. Requires secret_key parameter in query.

methods: GET
parameters:
- secret_key (required)
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/status?secret_key=supersekretkey
result:

<status>
  <build hash="d91d063efb5a8439643147c7367e3a4ddad5ec63" status="Done" since="2018-05-11 18:48:32" accessed="2018-05-11 18:29:57" accessed-secs-ago="326021.5"/>
  <build hash="736e99a73b5c9fdc1d284397a8790df17afe3214-f" status="Done" since="2018-05-11 18:49:57" accessed="2018-05-11 15:38:19" accessed-secs-ago="336318.9"/>
  <build hash="57fce7e430c7ab4dd83d5244b566dade92595db2" status="Done" since="2018-05-11 18:48:45" accessed="2018-05-15 11:47:15" accessed-secs-ago="4582.9"/>
</status>

cleanup

Removes builds that are finished and haven’t been accessed within the timeout (7 days). Requires secret_key parameter in query.

methods: GET
parameters:
- secret_key (required)
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/cleanup?secret_key=supersekretkey
result:

<message>
    <removed hash="1e1c4cdb04d593f1526ae21dd3908cfa7e6ca805"/>
    <removed hash="34dfdc2538023e44e7892ee9ac7f1071c6349544"/>
    <removed hash="2cac2b20734661dca6c388c46153aff79380d6d8"/>
</message>

Or if there are no old builds:

<message>No hashes to be removed.</message>

cleanup/errors

Removes builds that are finished and haven’t been accessed within the timeout (7 days) and the builds with status Error. Requires secret_key parameter in query.

methods: GET
parameters:
- secret_key (required)
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/cleanup/errors?secret_key=supersekretkey
result:

<message>
    <removed hash="1e1c4cdb04d593f1526ae21dd3908cfa7e6ca805"/>
    <removed hash="34dfdc2538023e44e7892ee9ac7f1071c6349544"/>
    <removed hash="2cac2b20734661dca6c388c46153aff79380d6d8"/>
</message>

Or if there are no builds with status Error:

<message>No hashes to be removed.</message>

cleanup/forceone

Removes a single build. Requires secret_key and hash parameter in query.

methods: GET
parameters:
- secret_key (required)
- hash (required)
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/cleanup/forceone?secret_key=supersekretkey&hash=1e1c4cdb04d593f1526ae21dd3908cfa7e6ca805
result:

<message>
    <removed hash="1e1c4cdb04d593f1526ae21dd3908cfa7e6ca805"/>
</message>

cleanup/forceall

Removes all the existing builds. Requires secret_key parameter in query.

methods: GET
parameters:
- secret_key (required)
example: https://ws.spraakbanken.gu.se/ws/sparv/v2/cleanup/forceall?secret_key=supersekretkey
result:

<message>
    <removed hash="1e1c4cdb04d593f1526ae21dd3908cfa7e6ca805"/>
    <removed hash="34dfdc2538023e44e7892ee9ac7f1071c6349544"/>
    <removed hash="2cac2b20734661dca6c388c46153aff79380d6d8"/>
</message>

Or if there are no builds:

<message>No hashes to be removed.</message>

Example settings

Swedish plain text input (default mode):

settings={
    "corpus": "exempelkorpus",
    "lang": "sv",
    "textmode": "plain",
    "word_segmenter": "default_tokenizer",
    "sentence_segmentation": {
        "sentence_chunk": "paragraph",
        "sentence_segmenter": "default_tokenizer"
    },
    "paragraph_segmentation": {
        "paragraph_segmenter": "blanklines"
    },
    "positional_attributes": {
        "lexical_attributes": [
            "pos",
            "msd",
            "lemma",
            "lex",
            "sense"
        ],
        "compound_attributes": [
            "complemgram",
            "compwf"
        ],
        "dependency_attributes": [
            "ref",
            "dephead",
            "deprel"
        ],
        "sentiment": [
            "sentiment"
        ]
    },
    "named_entity_recognition": [],
    "text_attributes": {
        "readability_metrics": [
            "lix",
            "ovix",
            "nk"
        ]
    }
}

Swedish with xml input:

settings={
    "corpus": "exempelkorpus",
    "lang": "sv",
    "textmode": "xml",
    "word_segmenter": "default_tokenizer",
    "sentence_segmentation": {
        "tag": "s",
        "attributes": [
            "number"
        ]
    },
    "paragraph_segmentation": {
        "tag": "p",
        "attributes": [
            "name"
        ]
    },
    "root": {
        "tag": "text",
        "attributes": [
            "title"
        ]
    },
    "extra_tags": [
        {
            "tag": "chapter",
            "attributes": [
                "name"
            ]
        }
    ],
    "positional_attributes": {
        "lexical_attributes": [
            "pos",
            "msd",
            "lemma",
            "lex",
            "sense"
        ],
        "compound_attributes": [
            "complemgram",
            "compwf"
        ],
        "dependency_attributes": [
            "ref",
            "dephead",
            "deprel"
        ],
        "sentiment": [
            "sentiment",
            "sentimentclass"
        ]
    },
    "named_entity_recognition": [],
    "text_attributes": {
        "readability_metrics": [
            "lix",
            "ovix",
            "nk"
        ]
    }
}

English (analysed with FreeLing):

settings={
    "corpus": "example",
    "lang": "en",
    "textmode": "xml",
    "root": {
        "tag": "text",
        "attributes": []
    },
    "extra_tags": [],
    "positional_attributes": {
        "lexical_attributes": [
            "pos",
            "msd",
            "lemma"
        ]
    },
    "text_attributes": {
        "readability_metrics": [
            "lix",
            "ovix",
            "nk"
        ]
    }
}

Finnish (analysed with TreeTagger):

settings={
    "corpus": "example",
    "lang": "fi",
    "textmode": "xml",
    "word_segmenter": "default_tokenizer",
    "sentence_segmentation": {
        "sentence_chunk": "paragraph",
        "sentence_segmenter": "default_tokenizer"
    },
    "paragraph_segmentation": {
        "paragraph_chunk": "text",
        "paragraph_segmenter": "blanklines"
    },
    "root": {
        "tag": "text",
        "attributes": [
            "title"
        ]
    },
    "extra_tags": [],
    "positional_attributes": {
        "lexical_attributes": [
            "pos",
            "msd",
            "lemma"
        ]
    },
    "text_attributes": {
        "readability_metrics": [
            "lix",
            "ovix",
            "nk"
        ]
    }
}

Swedish development mode (Sparv labs):

settings={
    "corpus": "exempelkorpus",
    "lang": "sv-dev",
    "textmode": "plain",
    "word_segmenter": "default_tokenizer",
    "sentence_segmentation": {
        "sentence_chunk": "paragraph",
        "sentence_segmenter": "default_tokenizer"
    },
    "paragraph_segmentation": {
        "paragraph_segmenter": "blanklines"
    },
    "positional_attributes": {
        "lexical_attributes": [
            "pos",
            "msd",
            "lemma",
            "lex",
            "sense"
        ],
        "compound_attributes": [
            "complemgram",
            "compwf"
        ],
        "dependency_attributes": [
            "ref",
            "dephead",
            "deprel"
        ],
        "lexical_classes": [
            "blingbring",
            "swefn"
        ],
        "sentiment": [
            "sentiment",
            "sentimentclass"
        ]
    },
    "named_entity_recognition": [
        "ex",
        "type",
        "subtype"
    ],
    "text_attributes": {
        "readability_metrics": [
            "lix",
            "ovix",
            "nk"
        ],
        "lexical_classes": [
            "blingbring",
            "swefn"
        ]
    }
}