• Audit Framework Control ID Detections
  • Named Entity Extraction like organizations, dates and more
  • Decryption of encrypted pdfs
  • Translation of foreign language pdfs
  • Document Classification
  • Document Section Detection
  • the ability to execute in parallel
  • translating a foreign language document,
  • processing OCR results into raw text,
  • detecting keywords inside text,
  • running machine learning inference on text.
  • PaginatedText depends on a consumable PDF and creates a list of strings
    – RunDocInference depends on both and classifies the document
    – KeywordDetection depends on paginated text and produces matches
  • CreateCSVOutput depends on doc classification and keyword detection and produces a formatted CSV of their outputs.



Source link