Close Menu
geekfence.comgeekfence.com
    What's Hot

    Customer experience management (CXM) predictions for 2026: How customers, enterprises, technology, and the provider landscape will evolve 

    December 28, 2025

    What to Know About the Cloud and Data Centers in 2026

    December 28, 2025

    Why Enterprise AI Scale Stalls

    December 28, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook Instagram
    geekfence.comgeekfence.com
    • Home
    • UK Tech News
    • AI
    • Big Data
    • Cyber Security
      • Cloud Computing
      • iOS Development
    • IoT
    • Mobile
    • Software
      • Software Development
      • Software Engineering
    • Technology
      • Green Technology
      • Nanotechnology
    • Telecom
    geekfence.comgeekfence.com
    Home»iOS Development»SwiftText | Cocoanetics
    iOS Development

    SwiftText | Cocoanetics

    AdminBy AdminDecember 28, 2025No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    SwiftText | Cocoanetics
    Share
    Facebook Twitter LinkedIn Pinterest Email


    SwiftText Logo

    Over the course of the last year, I’ve had quite a few side projects that required some way to get text from a variety of sources, with code and frameworks found in a number of private repos. A while ago, I felt an inkling to start pulling those together into an open source project. So this will be my Christmas gift for you this year.

    SwiftText collects various ways of getting text — or, if possible, Markdown — from a variety of sources and places.

    Update: … now Images, PDFs, Word DOCX and also HTML pages or URLs.

    One such use case was to get pure text from bank statements for my investment portfolio, so that I could parse the text and construct a CSV file to upload my holdings to Yahoo Finance.

    Reading PDFs

    For the most part these statements were normal PDFs that had been programmatically created. The advantage of those is that you can get the actual text from selection ranges, just like when you select the text and then copy it to the pasteboard. This is the one sort of PDFs you might find with vector data. Essentially those files are just a record of drawing information into a vector context.

    But there was a problem, because some of those statements were scanned from paper. This is the other — less useful — sort of PDFs: those are essentially collections of bitmap images, one per page. But thankfully we do have quite capable OCR capabilities on Mac and iOS in the form of the Vision framework.

    With both PDF selection ranges as well as text fragments from Vision you get rectangles with text. So I made it such that you only have to ask a PDFPage for its textLines(). It will first attempt to get the text from the selection ranges and if it fails it will render the page into a 300 DPI bitmap and then OCR it, to still give you more or less the same result. Those text lines are comprised of those fragments that are likely forming a line, even though there might be tabs or whitespace between them.

    This was the state of this private framework for the longest time. It saw a lot more usage in a receipt scanner I am building for myself and also when I was asked by a friend to translate several PDFs, it was extremely lucky that I had a quick way to get the raw text from those PDFs to feed into ChatGPT. This opened my mind for the possibility that this might be quite useful in agentic scenarios where agents need to get to the text of things.

    So the idea for SwiftText was born: it should be an open source project that collects various forms of getting text — or, if possible, Markdown — from a variety of sources and places.

    Reading DOCX

    For PDFs I had already covered both types of PDF files, extracting the OCR for bitmap images was a simple exercise. There was a case where I had to get the pure text from a Word document (DOCX) instead of PDF. Granted, I could just copy the text out of that, but my goal is to have that in a form — a tool — that I could use to automate such work in the future.

    I had a look at how DOCX files are constructed: they are just a ZIP archive of a couple of XML files. At the heart there is a document.xml which contains the actual document text. So I gave this task to Codex and with nearly no extra input from me it was able to create a utility that would output the pure text from such a Word doc. Behind the scenes it uses XMLParser, so the only external dependency for that is ZIPFoundation, because to my knowledge there is no first-party ZIP reading capability that fits this use case across Apple’s platforms.

    Markdown has a slight edge over pure text because it marks emphasis on specific terms, tells us about headlines of different levels, and also clearly structures lists — numbered or bulleted. But my Codex agent also had no problem pulling out this style information from the DOCX contents.

    SwiftText comes with a demo CLI app that lets you perform OCR. This gives you Markdown for a Word file:

    swift run swifttext docx file.docx --markdown

    For PDF or bitmaps you do:

    swift run swifttext ocr file

    For the latter I do have experimental Markdown support, but it’s been very challenging to get semantic information from those kinds of sources. I have the beginnings of a semantic parser — again from Vision — which promises proper paragraphs, tables, and lists. But unfortunately at this time it seems that I couldn’t get it to work reliably. The problem with tables is that Vision seems to be very easily thrown off by some layouts, detects superfluous columns and what not. The best approach here would probably be to look at lines that have text always at the same x positions and then infer the table structure from that. This is clear future work.

    Of course the easiest would be to just hand your files to ChatGPT — or some local Vision-enabled LLM — and ask for it to just give you the text. But with this decision you leave the area of perfect determinism and structure. And also you start to have costs of those tokens. There is still something to be said for a purely local solution that leverages functionality available natively on Apple platforms. The existence of the Vision framework in particular will make it impossible for this to ever be available on other platforms. But alas, I can live with only being able to support iOS and Mac with SwiftText.

    Warning: Traits

    This package has another first for me: package traits.

    With those — if you use Swift tools 6.1 or higher — you can import SwiftText as an umbrella module which itself contains SwiftTextOCR, SwiftTextPDF, and SwiftTextDOCX.

    If I understand that correctly, at some point in the future SwiftPM will be able to omit external dependencies if they are not needed. Right now they are still being resolved and downloaded, although not compiled if not referenced by code. The one immediate nicety is that you can simply import SwiftText in your code, and the specified traits decide what gets packaged into that for you.

    This is an improvement over the previous method of having separate imports for all targets/products you want: import SwiftTextPDF and import SwiftTextDOCX (and perhaps future traits like — dare I say — HTML).

    Quo Vadis?

    I have a few more private things that I would like to see move into SwiftText. I do have a functioning tool that gets Markdown from HTML, which requires libXML. This is handy for getting an LLM-friendly version of web pages.

    Some web pages build their content with JavaScript — like e.g. OpenAI API documentation. I’ve got a solution for that as well, leveraging WebKit which works by loading the web page with WebKit and waiting for the DOM to be complete. Then it extracts the DOM’s HTML and parses that.

    So these will be some of the next additions to this project. Then there’s of course more document semantics. It would be great to get proper Markdown tables from anywhere. We’ll see about that. That might come more quickly from Word than from PDFs because XML is orders of magnitude more structured than PDFs.

    Conclusion

    I am excited to share SwiftText with the OSS community because it has proven its worth to me on many occasions. I could have waited until it is even more polished but I was eager to make my work here public. I have some ideas for the future direction of SwiftText and I invite you to get in touch with specific use cases where enhancements might fit with the spirit of SwiftText.

    Update, later the same day….

    Because Codex is really amazing copying code between projects while integrating it, I was able to add my libXML-based HTMLParser as well as the code to convert HTML to markdown. Enjoy!

    Like this:

    Like Loading…

    Related


    Categories: Projects



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    Uniquely identifying views – The.Swift.Dev.

    December 27, 2025

    Unable to upload my app with Transporter. Some kind of version mismatch? [duplicate]

    December 26, 2025

    Experimenting with Live Activities – Ole Begemann

    December 25, 2025

    Announcing Mastering SwiftUI for iOS 18 and Xcode 16

    December 24, 2025

    Grouping Liquid Glass components using glassEffectUnion on iOS 26 – Donny Wals

    December 22, 2025

    My AI Company Vision | Cocoanetics

    December 21, 2025
    Top Posts

    Understanding U-Net Architecture in Deep Learning

    November 25, 20258 Views

    Microsoft 365 Copilot now enables you to build apps and workflows

    October 29, 20258 Views

    Here’s the latest company planning for gene-edited babies

    November 2, 20257 Views
    Don't Miss

    Customer experience management (CXM) predictions for 2026: How customers, enterprises, technology, and the provider landscape will evolve 

    December 28, 2025

    After laying out our bold CXM predictions for 2025 and then assessing how those bets played out…

    What to Know About the Cloud and Data Centers in 2026

    December 28, 2025

    Why Enterprise AI Scale Stalls

    December 28, 2025

    New serverless customization in Amazon SageMaker AI accelerates model fine-tuning

    December 28, 2025
    Stay In Touch
    • Facebook
    • Instagram
    About Us

    At GeekFence, we are a team of tech-enthusiasts, industry watchers and content creators who believe that technology isn’t just about gadgets—it’s about how innovation transforms our lives, work and society. We’ve come together to build a place where readers, thinkers and industry insiders can converge to explore what’s next in tech.

    Our Picks

    Customer experience management (CXM) predictions for 2026: How customers, enterprises, technology, and the provider landscape will evolve 

    December 28, 2025

    What to Know About the Cloud and Data Centers in 2026

    December 28, 2025

    Subscribe to Updates

    Please enable JavaScript in your browser to complete this form.
    Loading
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2025 Geekfence.All Rigt Reserved.

    Type above and press Enter to search. Press Esc to cancel.