8 comments

  • pietroppeter 1 day ago
    For the curious on how it works (not mentioned in the readme), it uses pymupdf and a precise mapping of all information in area coordinates, as such the document layout is hard coded.

    When layout changes this breaks but layout changes on this sort of documents do not happen often (I think). Also code is very clean and it serms straightforward to fix.

    This kind of code is maybe something that can be generated from an LLM/agent? (It would be easy to write checks)

    Besides the practical value for those who might need it, I think it is possibly interesting for others to look at this approach.

    Neat project, thanks for sharing!

    • genbs 1 day ago
      Thanks for reporting and for the feedback
  • liendolucas 1 day ago
    Offtopic: Stay away from Poste Italiane at all costs. The worst bank I have ever dealt with in my entire life. I'm glad that I don't have to deal with them anymore. Terrible service and eternal waitings on their branches. They are extremeley incompetent.
    • sebtron 1 day ago
      Not to mention it is also pretty terrible as a postal service.
      • Beijinger 1 day ago
        Underrated Quality of Life Indicator: Reliability of Postal Services

        https://expatcircle.com/cms/underrated-quality-of-life-indic...

        They’re actually doing alright on that list — and Belgium isn’t doing too badly either, especially considering it used to rank below Moldova a few years ago. That said, Belgium’s postal service is awful in every possible way. I once had packet losses of 50%, and the ping was miserable too.

        During COVID, I needed stamps. The local post office? Closed. So I tried buying them online — and yes, you can order stamps online. But guess how they deliver them? By snail mail. Classic.

        Now compare that to Germany. While Germany isn’t exactly a digital pioneer, its postal system has consistently performed well. I think they even offered a letter-scanning service at some point. And for years now, you’ve been able to print stamps at home. No printer? No problem — just write a code on the envelope with a pen. Every stamp includes a tracking code. As for those Amazon parcel lockers? Germany had them ages ago. And if I’m not mistaken, the idea was borrowed from former East Germany.

        USPS is actually fairly reliable, but the post offices themselves feel run-down and neglected. Sometimes the solutions are simple. If I were running USPS, I’d tour post offices around the world to see what unique services they offer. What can we adopt, license, or copy? How are they staying profitable? There’s so much to learn.

        • gpderetta 1 day ago
          As a data point CardMarket is an online Europe-wide market for collectible trading cards. Delivery from Italian sellers is usually much slower than from pretty much everywhere else and Italian sellers invariably sell at a discount (even when selling English language cards).
    • nkjoep 1 day ago
      In fact, it's not even a bank.
      • koakuma-chan 1 day ago
        A bit off topic, but how do you guys use banks to buy expensive things? I always run into issues, like I couldn't pay for my dentist because my card had a 300$ tap limit (I had to come back the next day and pay the rest lol), and there's all sort of limits, like I don't think I can spent more than a couple thousand dollars a day even if I physically bring my card and insert it or whatever.
        • piltdownman 1 day ago
          Add your card to your Google Wallet, and then use NFC from your phone for contactless payment. In Ireland the tap limit on a card is €50, but uncapped AFAIK using the same card via your Google Wallet.

          Fintech banking like Revolut that comes with a separate IBAN and physical/virtual cards are helpful in such scenarios as well.

        • HelloNurse 1 day ago
          Currently, payments from your account to another account (you need the other party's IBAN) cost about 1.50€, probably less if you find a less greedy bank, and are executed immediately.

          Retro options include cheques or (better) cashier's cheques.

        • vladvasiliu 1 day ago
          All the banks I know have some limit over a sliding 30 day window. Some allow for this limit to be adjusted. But the minimum amount I've had was 1000 €.

          Contactless is limited to 50 € per transaction. Going above requires inserting the card and entering the pin.

        • ThePowerOfFuet 1 day ago
          >I always run into issues, like I couldn't pay for my dentist because my card had a 300$ tap limit

          Insert it and enter your PIN.

          • koakuma-chan 1 day ago
            I don't remember my PIN
            • 47282847 10 hours ago
              You’re free to add a contact to your phone contacts for Aunt Helga with a local area code and your PIN repeated twice.
    • genbs 1 day ago
      it was a mistake in my youth
  • denysvitali 1 day ago
    I would love to have something more generic (and tried to build it already), but parsing tables and bank statements even from digital PDFs (as in, those that really have tables and not a picture) is still very difficult. Especially when the bank changes layouts from one month to another.

    I would love to be proven wrong, but everything I have tried so far is... subpar.

    Nowadays there's probably a solution based on LLMs, but I don't trust them with this kind of data

    • vdm 1 day ago
    • dimitri-vs 1 day ago
      Have you tried datalab-to/marker with the "Use LLM" option? They have a playground you can test it out on https://www.datalab.to/playground but I use their local CLI option: https://github.com/datalab-to/marker

      I just tried it on a fairly ugly TD Bank statement PDF I have and the markdown of the whole PDF (tables and all) is very accurate. Here is the config I use:

      marker_single --format_lines --use_llm --llm_service marker.services.gemini.GoogleGeminiService --gemini_model_name gemini-2.5-flash --disable_image_extraction --output_format markdown --output_dir "$OutDir" ` "$In"

      You might be able to tell the LLM to directly output the data in CSV format - granted it will still be in a .md file - using the `--block_correction_prompt` which apparently is "useful for custom formatting or logic that you want to apply to the output"

      • denysvitali 1 day ago
        > Nowadays there's probably a solution based on LLMs, but I don't trust them with this kind of data

        If it works with a small model I can run locally, I might think of this approach, otherwise I'll skip

    • jgalt212 1 day ago
      > Nowadays there's probably a solution based on LLMs, but I don't trust them with this kind of data

      In practice, the flow from my perspective looks like LLM parser -> normalizer -> validator. So you only save one step (parser), and given the unique stochastic nature of the LLM output, the normalizer and validator can be trickier to write than one used for an old-fashioned rules-based parser. But each situation is different, or YMMV.

  • simonebrunozzi 1 day ago
    Subtitled: "agli sventurati che hanno un conto postale" ("(dedicated to) the unlucky ones that have a bank account with the Italian postal service")

    The usual, amazing irony of us Italians. Love it.

  • amadeuspagel 1 day ago
    My brother made a similar tool for trade republic: https://kontoauszug.jonathanpagel.com/
  • Tox46 1 day ago
    Loving the subtitle in the md
    • brightbeige 1 day ago
      > agli sventurati che hanno un conto postale

      translates to:

      “to the unfortunate ones who have a postal account”

  • rcastellotti 1 day ago
    POPOPO POPOPO POPO POPOOOO come on fellow italians, sing with me!