Not logged in!

Document

The Document class allows you to work with text data
Introduction

The Document class deals with handling textual or document data. For example, in question answering systems, classification systems, etc.

Contents

Document.View

Document.View allows you to view/output a document with text

Example Applications:

  • Returning answers (and questions) from a question answering model

Example Usage
1import feather as ftr
2from my_model import my_qa_model
3
4def init():
5    return ftr.File.Upload(types=[".txt"], title="Upload articles you want to ask questions on")
6
7def collect_questions(uploader):
8    documents = uploader.get_text_files() # equivalent to 'uploader.get(format="text")'
9    # documents = [
10    #     {"name": "doc1.txt", "data": "Alexander III of Macedon (20/21 July 356 BC..."},
11    #     {"name": "doc2.txt", "data": "Europa or Jupiter II, is the smallest of the..."}
12    # ] 
13      
14    document_questions = ftr.Document.WithTextIn(documents, default_text=None, max_chars=256,
15                                                title="Enter a question for each document", description=None)
16    
17    return document_questions
18      
19def run_qa(document_questions):
20    questions = document_questions.get_text()
21    # questions = ["Who was Alexander's father?", "What is the sixth largest moon in the solar system?"]
22
23    documents = document_questions.documents
24    answers = my_qa_model(documents, questions) # ["Philip II of Macedon", "Europa"]
25    qas = ["Answer: {}; Question: {}".format(answer, question) for answer, question in zip(answers, questions)]
26
27    #################### CONSTRUCTOR ####################
28    return ftr.Document.View(documents, output_text=qas)
29    #####################################################
30    
31if __name__ == "__main__":
32    bundle = ftr.bundle(code_files=[__file__, "my_model.py"], model_files=["outputs/model.ckpt"])
33    ftr.build(name="Question Answering Model", init=init, steps=[collect_questions, run_qa], file_bundle=bundle)

Constructor:
  • documents: DocumentInputType - a list of documents (see DocumentInputType) to be shown to the end user
  • output_text: Optional[ListType[str]] = None - an optional list of output texts to be shown for a list of documents
    • If type == None, only the documents are displayed
    • If type == ListType[str], length of output_text must equal length of documents. Each document in documents will be displayed with its respective element in output_text
  • title: Optional[str] = None - a title semantically attached to the component
  • description: Optional[str] = None - a description semantically attached to the component

Attributes:
  • documents - get the documents passed into this component

Component Playground:
Arg NameValue
documents
output_text
title
description
ftr.Document.View(
    documents=[
    {
        "name": "doc1.txt",
        "data": "Alexander III of Macedon (20/21 July 356 BC..."
    },
    {
        "name": "doc2.txt",
        "data": "Europa or Jupiter II, is the smallest of the..."
    }
], 
    output_text=["Answer: Philip II of Macedon; Question: Who was Alexander's father?", "Answer: Europa; Question: What is the sixth largest moon in the solar system?"], 
    title="Your Question Answer outputs", 
    description="Click a document on the left to view the answer your question")
Document.WithTextIn

Document.WithTextIn allows you get some user input for documents.

Example Applications:

  • Getting a question for a document in a question answer model

Example Usage
1import feather as ftr
2from my_model import my_qa_model
3
4def init():
5    return ftr.File.Upload(types=[".txt"], title="Upload articles you want to ask questions on")
6
7def collect_questions(uploader):
8    documents = uploader.get_text_files() # equivalent to 'uploader.get(format="text")'
9    # documents = [
10    #     {"name": "doc1.txt", "data": "Alexander III of Macedon (20/21 July 356 BC..."},
11    #     {"name": "doc2.txt", "data": "Europa or Jupiter II, is the smallest of the..."}
12    # ] 
13      
14    #################### CONSTRUCTOR ####################
15    document_questions = ftr.Document.WithTextIn(documents, default_text=None, max_chars=256,
16                                                title="Enter a question for each document", description=None)
17#####################################################
18    
19    return document_questions
20      
21def run_qa(document_questions):
22    #################### ACCESSOR ####################
23    questions = document_questions.get_text()
24    # questions = ["Who was Alexander's father?", "What is the sixth largest moon in the solar system?"]
25    ##################################################
26
27    documents = document_questions.documents
28    answers = my_qa_model(documents, questions) # ["Philip II of Macedon", "Europa"]
29    qas = ["Answer: {}; Question: {}".format(answer, question) for answer, question in zip(answers, questions)]
30    return ftr.Document.View(documents, output_text=qas)
31    
32if __name__ == "__main__":
33    bundle = ftr.bundle(code_files=[__file__, "my_model.py"], model_files=["outputs/model.ckpt"])
34    ftr.build(name="Question Answering Model", init=init, steps=[collect_questions, run_qa], file_bundle=bundle)

Constructor:
  • documents: DocumentInputType - a list of documents (see DocumentInputType) to be shown to the end user
  • default_text: Optional[Union[ListType[str], str]] = None - an optional list of default texts to be shown in the text input boxes.
    • If type == None textboxes with no default text in them are displayed
    • If type == str, the argument is duplicated for as many documents as provided above.
    • If type == ListType[str], the length of default_text must equal length of documents
  • max_chars: Optional[int] = 256 - an optional argument which specifies the maximum amount of characters a user can enter in a text box.
  • title: Optional[str] = None - a title semantically attached to the component
  • description: Optional[str] = None - a description semantically attached to the component

Attributes:
  • documents - get the documents passed into this component

Accessors:
  • get_text() -> ListType[str] - a list of strings of the user entered text
  • get() - see below
get() (usage of .get() is not recommended)
get(): 
    return get_text()

Component Playground:
Arg NameValue
documents
default_text
max_chars
title
description
ftr.Document.WithTextIn(
    documents=[
    {
        "name": "doc1.txt",
        "data": "Alexander III of Macedon (20/21 July 356 BC..."
    },
    {
        "name": "doc2.txt",
        "data": "Europa or Jupiter II, is the smallest of the..."
    }
], 
    default_text=["Who was Alexander's father?", "What is the sixth largest moon in the solar system?"], 
    max_chars=256,
    title="Enter questions based on each document", 
    description="Make sure the question is answerable given the document text")
Document specific types
# N.B. "ListType" in our documentation refers to the standard Python List type
  # (so as not to be confused with our 'List' class)

class DocumentInputObj(): 
    name: str
    data: str

DocumentInputType = Union[ListType[DocumentInputObj], ListType[str]]