Documentation (backend.py)

Available calls to the backend

Public calls (may require password for protected lexicons):
modes see the hierarchy of the modes
groups see the available index modes and their components
modeinfo see the available search fields of a mode
lexiconinfo see the available search fields of a lexicon
lexiconorder for retrieving the lexicon order
query for normal querying
querycount for querying with some statistics
minientry for getting minientries
statistics for getting statistics, aggregation
statlist for getting statistics, table view
autocomplete for autocompletion of lemgrams
saldopath for showing the path from a saldo sense to PRIM
getcontext for showing the (alphabetical) neighbours of an entry
explain for normal querying
random for retrieving a random lexical entry
suggest make an update suggestion
suggestnew suggest a new entry
checksuggestion see the status of a suggestion

Password protected calls:
delete delete an entry by #ID
mkupdate update a lexical entry
add add a lexical entry
readd add an entry which already has an id (one that has been deleted)
addbulk add multiple entries
add_child add an entry and link it to its parent
checksuggestions view suggestions
acceptsuggestion accept a suggestion
acceptandmodify accept a suggestion after modifications have been made to it
rejectsuggestion reject a suggestion
checkuser for checking whether the user is ok
checkuserhistory for retrieving the edit history of a user
checkhistory for retrieving the edit history of an entry
checklexiconhistory for retrieving the edit history of one or more lexicon
checkdifference see the difference between two versions
export export a lexicon

Query language

The query language is simple and based on the language used in karp's frontend.

Operators

Modes

The lexicons in Karp are divided into groups. A group consists of lexicons that has a similar document structure. Each group has it's own set of search fields, ordering functions etc. Groups that are sometimes queried together are gathered into modes. The modes often, but not always, correspond to the frontend modes. Every group is also a mode. You can find more information about the available groups here and modes here.

Search fields

Which search fields that are available depends on which mode you are using. To see all fields for SB's default lexicons, go to http://ws.spraakbanken.gu.se/modeinfo/karp

query

Login may be required for some lexicons.

Available query parameters:
'q': the query
'mode': which mode to search in. default: karp
'resource': one or more comma separated lexicon to search. default: all
'size': number of hits to show on each page. default: 25
'start': the index of the first hit to show. default: 0
'sort': one or more comma separated fields to sort on. default: depends on mode, usually lexiconOrder, score, baseform, lemgram
'format':get the result in another format. which options that are available depends on lexicon and mode. examples: xml, lmf, tsb, tab, csv, app, tryck.

Formatting

Asking for another format will not return the json objects. For saol, the result will be

                      {"hits": {
                        "hits":   a xml string
                        ...
                        }
                      }

For other resources, the result will be

                     {"formatted": a string in the current format
                       ...
                     }
                      

NB. The structure of the result for format-posts might change!

Examples:

Result:

hits{
  total : number of hits
  hits:   a list with information about the hits
     hits[n]._source:      all information in hit n
     hits[n]._source._id   elasticsearch's identifier for the entry
     hits[n]._version:     the current version of this entry
     hits[n]._source.lexiconName
     hits[n]._source.lexiconOrder
     hits[n]._source.FormRepresentations
     hits[n]._source.Sense
     hits[n]._source.WordForms

     hits[n]._source.ListOfComponents
     hits[n]._source.RelatedForm
     hits[n]._source.compareWith
     hits[n]._source.entryType
     hits[n]._source.saldoLinks
     hits[n]._source.see
     hits[n]._source.symbolCenter
     hits[n]._source.symbolHeight
     hits[n]._source.symbolPath
     hits[n]._source.symbolWidth

     For freetext search (simple||...) only:
     hits[n].highlight       information and paths about the matching part of the entry
}

querycount

As a normal
query, but also shows the distribution over lexicons. Is a mix of query and statistics. The distribution results are sorted in lexicon order.

Example:

Result:

  hits{...}                                          as above
  distribution                                       a list of counts for each lexicon that contains at least one match
  distribution[n].key                                the order for the n:th lexicon
  distribution[n].doc_count                          the count for the n:th lexicon
  distribution[n].lexiconName.buckets.[0].doc_count  the count for the n:th lexicon (again)
  distribution[n].lexiconName.buckets.[0].key        the name of the n:th lexicon

minientry

Returns a mini variant of the normal query result. Only shows the fields specified in "show", but otherwise works the same way as
query.

Available query parameters:
'mode': which mode to search in. default: karp
'q': the query
'resource': one or more comma separated lexica to search. default: all
'show': one or more comma separated fields to show. default: depends on mode, usually lexiconName, lemgram, baseform
'size': number of hits to show on each page. default: 25

Example:

Result: See query.

statistics

Aggregation, as provided by ElasticSearch. Shows the number of hits, group by the requested fields. Defaults to show the distribution grouped by lexicon and pos tags.

Available query parameters:
'q': the query
'mode': which mode to search in. default: karp
'resource': one or more comma separated lexica to search. default: all
'size': number of hits to show in each bucket. default: 100
'buckets': one or more comma separated fields to group the results by. default: lexiconName, pos
'cardinality': shows the cardinality number of values for the innermost of the requested buckets, instead of showing the actual values. Not compatible with 'q'. 'size' will be ignored.

Example:

The result is not sorted.

Result: X is the name of the first bucket (default: lexiconName), Y (defaults to pos) the second and so on. For a search where a query or a resource is specified:

aggregations {
    q_statistics.doc_count : total number of hits
    
    q_satistics.X: information about the data grouped by X (the first bucket)
    q_satistics.X_missing: information about the data missing X

      q_satistics.X.buckets[n].key: X value
      q_satistics.X.buckets[n].doc_count: number of hits within the X value

      q_satistics.X.buckets[n].Y: information grouped by X and then Y
      q_satistics.X.buckets[n].Y.doc_count: number of hits within the Y value in X


      q_satistics.X_missing.buckets[n]: information about entries which do not have any X value
      q_satistics.X_missing.buckets[n].doc_count: number of hits that do not have any X value

      q_satistics.X.buckets[n].Y_missing: information grouped by X and then Y, showing cases without any value for Y
      q_satistics.X.buckets[n].Y.doc_count: number of hits within the Y value in X

      ....
}

statlist

Gives a table view based on the bucketed aggregations (see
statistics). Shows the number of hits, group by the requested fields. Defaults to show the distribution grouped by lexicon and pos tags.

Available query parameters:
'q': the query
'mode': which mode to search in. default: karp
'resource': one or more comma separated lexica to search. default: all
'buckets': one or more comma separated fields to group the results by. default: lexiconName, pos
'size': number of hits to show in each bucket. Does hence not correspond to the number of table rows. default: 100.

Example:

Result:

{
  "stat_table": [
    [
      "konstruktikon",
      "",
      88
    ],
    [
      "saldom",
      "nn",
      3
    ],
    ...
  ]
}

autocomplete

Gives suggestions for lemgrams (or other fields, as specified for each mode) which have a word form (or other field, as specified for each mode) matching the given one.

It does not match prefixes. Searching for "sig" does hence not give suggestions like "sigill" or "signatur".

Provides lemgram suggestions to Korp, by looking in mode 'external'.

Examples:

Available query parameters:
'q': the query, a word form
'multi': a comma separated list of queries (word forms). do not use together with q
'resource': one or more comma separated lexica to search. default: all
'mode': which mode to search in. default: karp

The result is not sorted and one lemgram may occur multiple times

Result:

hits{
  total : number of hits
  hits  : information about the hits
  hits[n]._source : information about hit n
  hits[n]._source.FormRepresentations.lemgram  : lemgram
}
If 'multi' parameter is used, the output will be a dictionary with one key corresponding to every input word. The values will be the same format as for q:
{"kasta": {"hits": ...  },
 "docka": {"hits": ...  }
}

saldopath

Shows the path from a saldo sense to PRIM. Only works for saldo senses.

Example:

Result:

{ path: [input_sense..1, ... , PRIM..1] }

getcontext

Shows the (alphabetical) neighbours of an entry

The sorting order is based on the mode configs. (The order must be strict, eg. no two words may have the same score. If they do, getcontext will not work properly.)

Example:

Available query parameters:
'center': the ES-ID of the entry to center the search around. default: the first entry
'q': an optional query to restrict entries that appear in the result
'size': number of hits to show on each side of the center word. default: 10
Result:

{ center: [ *the centered entry* ],
  pre:    [ *a list of hits (that match the query) occurring immediately before the centered * ],
  post:   [ *a list of hits (that match the query) occurring immediately after the centered entry *]}

explain

A tool for debugging. Shows the result of the given query, the json formatted query as sent to ElasticSearch and the information given by a _validate/query?explain call to ElasticSearch.

Example:

Result:

ans                  the normal query result
elastic_json_query   the query translated to Elastic's api
explain              Elastic's result to a _validate/query?explain query.

modes

Shows the hierarchy of the modes.

groups

Shows the available groups (modes to which updates can be made).

modeinfo

Shows the search fields that are available in a mode.

lexiconinfo

Shows the search fields that are available in a lexicon. Note that the fields are set on the level of lexicon groups - some fields in the listing might be unused by the specified lexicon.

lexiconorder

Shows the lexicons and their order. The query results are also ordered in this way.

Result:

{
  "lexiconA": 1,
  "lexiconB": 3,
  "lexiconC": 8,
  ...
}

random

Shows a randomly selected lexical entry.

Available query parameters:
'resource': one or more comma separated lexica to search. default: all
'mode': which mode to search in. default: karp
'show': one or more comma separated fields to show. default: depends on mode, usually lexiconName, lemgram, baseform
'show_all': all fields of the entries will be shown if this flag is set to true (or any value). Overrides show
'size': number of hits to show on each page. default: 1

Example:

suggest, suggestnew

Works like update, but requires no log in. The suggestion is stored in a separate system until it has been accepted by a logged in user.

The field 'version' is optional, but will prevent an old version to later override a newer one. Example:

Result:

{
  "es_ans": {...}  // output from ES
  "es_loaded": 1,  // 1 if the suggestion is stored in ES
  "id":           // the #ID of the suggestion. Can be used to see the current status, accept or reject the suggestion
  "sql_loaded": 1,  // 1 if the suggestion is stored in SQL
  "suggestion": true,
  sql_error // present if there were errors storing the suggestion
}

delete

Deletes one entry, identified by its lexicon and #ID. Requires a valid username and password.

Examples:

Result:

{
'sql_loaded' : 1 if successfully marked as deleted in the SQL database,
'es_loaded' : 1 if successfully deleted from ElasticSearch (is no longer searchable),
'es_ans'    : the answer from ES.
}

mkupdate

Updates a lexicalentry identified by its lexicon and Elasticsearch's id (#ID). Requires a valid username and password. The IDs are found in any query results. To avoid conflicts, the last known version number of the entry can be provided as a query string. If a version number is provided and the database entry has been updated since the our last read, a version conflict error message will be returned. Example: Result:
{
'sql_loaded' : 1 if successfully saved in the SQL database,
'es_loaded' : 1 if successfully stored in ElasticSearch (is searchable),
'es_ans'    : {'_id':..., '_index':..., '_type':..., '_version': ...} //the answer from ES.
}

Error messages:

Version conflict:

{"message": "Database exception: Error during update. Message: TransportError(409, u'RemoteTransportException[...]; nested: VersionConflictEngineException[...]: version conflict, current [3], provided [1]]; ')."}
ID could not be found:
{"message": "Database exception: Error during update. Message: TransportError(404, u'RemoteTransportException[...]; nested: DocumentMissingException[...]: document missing]; ')"}

add, readd

Adds a lexicalentry. Requires a valid username and password. The given ID is found in the returned object and is associated with the entry in ElasticSearch and in the SQL database. Example: If an entry has existed in the data base before and has got an ID, it can be readded to get the same ID:

Result:

{
'sql_loaded' : 1 if successfully saved in the SQL database,
'es_loaded' : 1 if successfully stored in ElasticSearch (is searchable),
'es_ans'    : {'_id':..., '_index':..., '_type':..., '_version': ..., 'created' : True} //the answer from ES. ,
'suggestion': False. True if the update has been treated as suggestion.
}

addbulk

Adds a list of entries to a lexicon. Requires a valid username and password. Example:
{
'sql_loaded' : number of entries successfully saved in the SQL database,
'es_loaded' : number of entries successfully stored in ElasticSearch,
'ids'    : a list of the new entries IDs,
'suggestion': False
}

add_child

Adds a lexical entry and link it to its parent. Requires a valid username and password. The given ID is found in the returned object and is associated with the entry in ElasticSearch and in the SQL database. Example:

Result:

{
'parent':   the result of adding the link to the parent (see mkupdate).
'child' :   the result of adding the child (see add).
}

checksuggestions

Requires log in. Show the suggestions for the chosen lexicons.

Required query parameters:
'resource': one or more comma separated lexicons to search. default: all

Available query parameters:
'size': the number of suggestions to view (order by decreasing date). default: 50
'status': waiting, rejected, accepted. default: all

Example:

Result:

{
  "updates": [
    {
      "acceptmessage"  // is set when the suggestion is accepted or rejected
      "date":          // the date of suggestion
      "doc":           // the suggested lexical entry
      "id":            // the #ID of the suggestion
      "lexicon":       // the lexicon it belongs to
      "message":       // message from the suggester
      "origid":        // the #ID of the entry the suggestion concerns
      "status":        // the status of the suggestion (waiting, accepted or rejected)
      "user":          // the name or email adress of the suggester
      "version":       // the version of the entry that the suggestion concerns
    }
  ]

checksuggestion

Checks the status of a given suggestion. Does not require log in.

Example:

Result:

{
  "updates": [
    {
      "acceptmessage"  // is set when the suggestion is accepted or rejected
      "date":          // the date of suggestion
      "doc":           // the suggested lexical entry
      "id":            // the #ID of the suggestion
      "lexicon":       // the lexicon it belongs to
      "message":       // message from the suggester
      "origid":        // the #ID of the entry the suggestion concerns
      "status":        // the status of the suggestion (waiting, accepted or rejected)
      "user":          // the name or email adress of the suggester
      "version":       // the version of the entry that the suggestion concerns
    }
  ]

acceptsuggestion

Requires log in. Changes the status of a suggestion to "accepted" and moves it to the live data base. Accepted suggestions are still present in the sql data base (but not searchable through the suggestion ES).

Example:

Result:

{
  "es_ans": {
    "_id":  // #ID of the updated entry
    "_index":
    "_type":
    "_version":
  },
  "es_loaded":  // 1 if successfully loaded to ES
  "sql_loaded": // 1 if successfully loaded to the live SQL
  "sugg_db_error": // present if there were errors storing the suggestion
  "sugg_db_loaded":  // 1 if successfully loaded to suggestion SQL
  "sugg_es_ans": {"es_ans" : {...} // ans from ES
                 ,"es_loaded":  // 1 if removed from the suggestion ES
                 ,"sql_loaded": // 1 if the suggestion was marked as accepted
                 }
}

acceptandmodify

Requires log in. Changes the status of a suggestion to "accepted_modified" and adds the new, modified, version to the live data base. Accepted suggestions are still present in the sql data base (but not searchable through the suggestion ES).

Example:

Result:

{
  "es_ans": {
    "_id":  // #ID of the updated entry
    "_index":
    "_type":
    "_version":
  },
  "es_loaded":  // 1 if successfully loaded to ES
  "sql_loaded": // 1 if successfully loaded to the live SQL
  "sugg_db_error": // present if there were errors storing the suggestion
  "sugg_db_loaded":  // 1 if successfully loaded to suggestion SQL
  "sugg_es_ans": {"es_ans" : {...} // ans from ES
                 ,"es_loaded":  // 1 if removed from the suggestion ES
                 ,"sql_loaded": // 1 if the suggestion was marked as accepted
                 }
}

rejectsuggestion

Requires log in. Changes the status of a suggestion to "rejected". Rejected suggestions are still present in the sql data base (but not searchable through the suggestion ES).

Example:

Result:

{
"es_ans": {...} // output from the deletion from the suggestion ES
  "es_loaded":  // 1 if successfully removed to the suggestion ES
  "sugg_db_error": // present if there were errors storing the suggestion
  "sugg_db_loaded":  // 1 if successfully loaded to suggestion SQL
}

checkuser

Checks whether the provided user log-in details are ok.

Result:

{
  "authenticated":   is the user name and password ok,
  "permitted_resources.lexica": lexicons that the user may see or edit
}

checkuserhistory

Shows the edit history of the user, ordered from newest to oldest. Available query parameters:
'size': number of hits to show on each page. default: 10

Result:

{
  "updates": [
    {"date"
    ,"doc"        //the entry that has been edited
    ,"id"
    ,"message"
    ,"user"
    },
    ...
  ]
}

checkhistory

Shows the edit history of an entry, selected by its identifier and ordered from newest to oldest. Available query parameters:
'size': number of hits to show on each page. default: 10
Example:

Result:

{
  "updates": [
    {"date"
    ,"doc"        //the entry that has been edited
    ,"id"
    ,"message"
    ,"user"
    },
    ...
  ]
}

checklexiconhistory

Shows the edit history of one lexicon, ordered from newest to oldest. If no lexicon is specified, all is picked. Available query parameters:
'size': number of hits to show on each page. default: 10

Examples. If a date is provided (in the correct format) only updates done later than this is shown.

Result:

{
  "updates": [
    {"date"
    ,"doc"        //the entry that has been edited
    ,"id"
    ,"message"
    ,"user"
    ,"type"  // CHANGED, ADDED or REMOVED, only present if checklexiconhistory is called
    },
    ...
  ]
}

checkdifference

Shows the diff of a chosen #ID.

Example.

Result:

{ "diff" : [

   {"field"  // a field that has been changed between the two versions
   ,"after"  // the content of the field in the later of the two versions
   ,"before" // the content of the field in the older of the two versions (not present if the field is added)
   "type"    // added, changed och removed
   }
]}

export

Exports a lexicon. Requires a valid username and password.

Available parameters:
'date': export the entries as they were a given date. default: latest
'export':export to another format than json, eg csv, tsv, xml. Not available for all lexicons! default: json
'size': number of hits to show. default: all entries

Example.

Result (if json):

{ "#lexicon" : [
     ... the lexicon ...
   ]
}