Files
SemesterapparatsManager/src/logic/pdfparser.py
WorldTeacher 0406fe4f6f Refactor and enhance type hints across multiple modules
- Updated the `from_tuple` method in `Prof` class to specify return type.
- Added type hints for various methods in `LehmannsClient`, `OpenAI`, `WebRequest`, and `ZoteroController` classes to improve code clarity and type safety.
- Modified `pdf_to_csv` function to return a string instead of a DataFrame.
- Enhanced error handling and type hints in `wordparser` and `xmlparser` modules.
- Removed unused UI file `Ui_medianadder.ts`.
- Improved the layout and structure of the `semesterapparat_ui` to enhance user experience.
- Updated file picker to support `.doc` files in addition to `.docx`.
- Added unique item handling in `Ui` class to prevent duplicates in apparat list.
- General code cleanup and consistency improvements across various files.
2025-10-21 09:09:54 +02:00

24 lines
561 B
Python

# add depend path to system path
from pdfquery import PDFQuery
def pdf_to_csv(path: str) -> str:
"""
Extracts the data from a pdf file and returns it as a pandas dataframe
"""
file = PDFQuery(path)
file.load()
# get the text from the pdf file
text_elems = file.extract([("with_formatter", "text"), ("all_text", "*")])
extracted_text = text_elems["all_text"]
return extracted_text
if __name__ == "__main__":
text = pdf_to_csv("54_pdf.pdf")
# remove linebreaks
text = text.replace("\n", "")
# print(text)