I Analyzed My Finance With Local LLMs

Thu Vu data analytics
31 Jan 202417:51

TLDRThe video transcript details the speaker's journey in analyzing personal finance using local Large Language Models (LLMs). They discuss the importance of reviewing financial transactions and the challenges of categorizing expenses. Inspired by someone else's income and expense breakdown, the speaker decides to use an open-source LLM locally to classify and analyze their bank transactions, aiming to determine their retirement timeline. They explore different frameworks like Lama CPP and GPT for running LLMs securely and privately on their laptop. After experimenting with various models, they find Llama 2 to be effective for categorizing expenses. The speaker then creates a custom model file for financial assistance, 'Expense Analyzer,' and uses it to classify transactions. They also discuss the limitations of LLMs in basic arithmetic. The process involves using Python and Jupyter Notebooks to interact with the LLM and process transaction data. Finally, they create a personal finance dashboard using Plotly Express and Panel to visualize income and expense breakdowns for 2022 and 2023. The speaker concludes by encouraging viewers to experiment with open-source LLMs for personal projects.

Takeaways

  • 💼 Realizing the importance of managing finances, the individual downloads bank transactions to review incomes and expenses periodically.
  • 📈 Inspired by someone else's income and expense breakdown, they decide to classify their own expenses into categories.
  • 🚫 Concerned about privacy, they avoid uploading sensitive bank statements to a chat website or using open APIs due to data storage concerns.
  • 💡 Deciding to use a local, open-source large language model (LLM) to analyze financial data, ensuring privacy and no internet connection is required.
  • 🛠️ Several frameworks are available to run LLMs locally, such as Lama CPP, GPT, and AMA, which help in quantization and efficient model usage.
  • 📱 The process of installing and using an LLM locally is straightforward and can be done through the terminal on Mac, Linux, or via Docker on Windows.
  • 🤖 Testing the LLM's ability to perform basic arithmetic and classify expenses from bank statements, with varying degrees of success.
  • 📉 Finding that LLMs may not be the best for basic arithmetic but can classify expenses into categories with reasonable accuracy.
  • 🔄 Discovering that asking the same question to LLMs multiple times can yield different answers, indicating a level of randomness in responses.
  • 🛠️ Customizing LLMs for specific tasks by creating a model file that specifies the base model and parameters like temperature for model behavior.
  • 🔗 Using Python and Jupyter Notebook for a more convenient interaction with LLMs, installing necessary libraries for integration.
  • 📊 Creating a personal finance dashboard with Plotly Express and Panel to visualize income and expense breakdowns for 2022 and 2023, including monthly earnings and expenditures.

Q & A

  • Why does the speaker decide to analyze their finances?

    -The speaker realizes the importance of managing finances as they grow older and decides to analyze their bank transactions to review incomes and expenses.

  • What is the challenge the speaker faces when classifying expenses?

    -The challenge is categorizing the expenses from buying transactions into appropriate categories, which often requires manual labor or low-tech methods.

  • Why does the speaker decide to use a local LLM instead of online services?

    -The speaker is concerned about the privacy and security of their sensitive financial information and prefers to use an open-source large language model locally on their laptop.

  • What are the benefits of running a large language model locally?

    -Running a large language model locally offers security by protecting personal data, it's free, and it doesn't require an internet connection or third-party services.

  • Which frameworks are mentioned for running an open-source language model locally?

    -The frameworks mentioned are Lama CPP, GPT for, and AMA.

  • How does the speaker intend to use the LLM for their bank statement analysis?

    -The speaker plans to use the LLM to classify all the expenses in their bank statement into categories such as groceries, rent, travel, etc., and then analyze the data in Python to create visualizations.

  • What is the issue with using the Mistro model for classifying expenses?

    -The Mistro model does not categorize the expenses as expected, misses some transactions, and does not reply in the desired format.

  • How does the speaker address the issue of the LLM providing different answers to the same question?

    -The speaker creates a custom model file with specific parameters and a custom system message to better tailor the LLM to their use case.

  • What is the process for interacting with the LLM through Python?

    -The speaker installs the Lang chain community library, accesses the language models through the terminal by specifying the model name, and uses a for loop to process transactions in batches of 30.

  • How does the speaker handle the token limit when classifying a large number of transactions?

    -The speaker uses a for loop to process the transactions in batches, taking 30 transactions at a time to avoid exceeding the token limit of the LLM.

  • What visualization tools does the speaker use to create a personal finance dashboard?

    -The speaker uses Plotly Express for creating interactive visualizations and Panel for organizing the dashboard.

Outlines

00:00

💰 Personal Finance Management with AI

The speaker discusses the importance of managing finances and how they review their bank transactions annually. Inspired by someone else's income and expense breakdown, they decide to classify their expenses into categories. They mention the challenge of categorizing expenses and the privacy concerns associated with uploading bank statements to a third-party service. To address this, they choose to use an open-source language model (LLM) locally on their laptop, specifically mentioning the use of llama 2. They also discuss the process of installing and running an LLM, and the use of frameworks like Lama CPP and GPT for local deployment. The speaker shares a GitHub link for the code used in the video and mentions a discount offer by corer for data analytics courses.

05:01

🤖 Testing LLMs for Expense Classification

The speaker tests the ability of LLMs to perform basic arithmetic and classify expenses from bank statements. They find that while LLMs can attempt complex tasks, they may not always provide accurate arithmetic results. When classifying expenses, they find that the 'mistro' model is not entirely accurate or in the desired format. They then switch to the 'llama 2' model, which performs better in categorizing expenses as expected. The speaker also explains the process of creating a custom model file to tailor the LLM to specific tasks, such as financial assistance with expense classification, and how to use it with the 'orama' tool. They highlight the flexibility and potential randomness in the responses of LLMs when asked the same question multiple times.

10:03

📊 Automating Expense Classification with Python

The speaker outlines the process of using Python to automate the classification of expenses using the LLM. They mention installing the Lang chain community library to access the language models through 'orama'. The speaker reads transaction data and discusses the challenge of inserting a large number of transactions into the prompt without exceeding the token limit of the LLM. They develop a method to feed transactions into the model in batches of 30 to optimize the response. Using a for loop, they process their bank statements, validate the output, and store the categorized transactions in a data frame. The speaker also describes how to handle interruptions in the process and how to save the results as a CSV file.

15:04

📈 Creating a Personal Finance Dashboard

The speaker describes the final step of creating a personal finance dashboard to visualize income and expenses. They use Plotly Express and Panel to create an interactive dashboard showing the breakdown of income and expenses for the years 2022 and 2023. The dashboard also includes a monthly comparison of earnings and expenditures. The speaker discusses the limitations of their overview, acknowledging that assets like investments and mortgage payments are not included in the expenses. They conclude by expressing hope that the project inspires viewers to experiment with open-source language models and envision a future where large language models are commonly run on personal devices.

Mindmap

Keywords

💡Bank Transactions

Bank transactions refer to the financial activities recorded by a bank for an individual or business. In the video, the speaker downloads all their bank transactions to review incomes and expenses, which is a common practice for personal finance management. It is a crucial step in analyzing one's financial health.

💡Expense Classification

Expense classification is the process of categorizing expenses into specific groups or categories such as groceries, rent, or travel. In the video, the speaker uses a large language model (LLM) to automatically categorize their expenses, which helps in understanding spending patterns and managing finances more effectively.

💡Large Language Model (LLM)

A large language model (LLM) is an artificial intelligence model that is trained on a vast amount of text data to predict and generate human-like language. In the context of the video, the speaker uses an LLM locally to classify expenses and analyze financial data, demonstrating the application of AI in personal finance.

💡Local Installation

Local installation refers to the process of installing and running software, such as an LLM, on a personal computer rather than relying on an internet connection or third-party services. The video emphasizes the security and privacy benefits of running an LLM locally, as it protects sensitive financial data.

💡Quantization

Quantization in the context of LLMs is the process of reducing the precision of the model's weights to use less memory and computational resources. The video mentions quantization as a technique used by frameworks to make LLMs more efficient for local use on consumer devices.

💡Personal Data

Personal data includes any information that can be used to identify an individual, such as their spending habits, places visited, and more. The video script discusses concerns over uploading bank statements due to the sensitive personal data they contain, highlighting the importance of data privacy.

💡API

An API, or Application Programming Interface, is a set of rules and protocols that allows different software applications to communicate with each other. The video mentions using open APIs for data analysis, but also notes the potential privacy risks of sending data via an API.

💡Data Analytics

Data analytics is the process of examining raw data with the purpose of drawing conclusions about that information. In the video, the speaker uses data analytics techniques to gain insights from their financial data, which is a key aspect of the project's goal to understand personal finances better.

💡GitHub

GitHub is a web-based platform for version control and collaboration that allows developers to work together on projects. The video script mentions sharing codes on GitHub, which is a common practice for developers to share and collaborate on code related to projects like the one discussed in the video.

💡Python

Python is a high-level programming language widely used for general-purpose programming. In the video, Python is used to analyze financial data and create visualizations, showcasing its utility in data analysis and financial management.

💡Jupyter Notebook

A Jupyter Notebook is an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations, and narrative text. The video script mentions using Jupyter Notebook for interacting with LLMs and managing the financial data analysis process.

💡Plotly Express

Plotly Express is a Python library for creating interactive and publication-quality visualizations. In the video, it is used to create a personal finance dashboard with visualizations of income and expense breakdowns, demonstrating its role in making complex data more accessible and understandable.

Highlights

The speaker emphasizes the importance of managing finances by reviewing bank transactions regularly.

Inspired by someone else's income and expense breakdown, the speaker decides to do the same.

Classifying expenses into categories is identified as the most challenging part of the process.

The speaker chooses to use a local large language model (LLM) for expense classification instead of uploading sensitive data online.

Local LLMs are highlighted as a secure and free alternative to online APIs for personal data protection.

Frameworks like Lama CPP and GPT for are introduced to run open-source language models locally.

The process of quantization and efficiency improvement for LLMs using local frameworks is explained.

A step-by-step guide on installing and using the LLM 'Mistral' locally is provided.

The speaker discovers that LLMs may not be the best for basic arithmetic but can classify expenses into categories.

Llama 2 is found to be more effective than Mistral for the task of expense classification.

Customizing LLMs with a model file allows for tailored responses to specific use cases.

The creation of a custom model file named 'expense analyzer' is demonstrated.

Using Python and Jupyter Notebooks for interacting with LLMs is shown to be more convenient.

The speaker details the process of reading and classifying 300 bank transactions using LLMs.

A method for handling token limits by processing transactions in batches is described.

The use of the 'pandas' library for data validation and cleaning is mentioned.

The speaker successfully categorizes transactions and merges them into a main dataset.

A personal finance dashboard is created using Plotly Express and Panel to visualize income and expense data.

The dashboard provides a breakdown of income and expenses for 2022 and 2023, and monthly earnings and spending.

The project concludes with a note on the importance of considering assets in personal finance, not just expenses.

The speaker expresses optimism about the future of using LLMs locally on personal devices.