How to implement an integration package
This guide walks through the process of implementing a LangChain integration package.
Integration packages are just Python packages that can be installed with pip install <your-package>
,
which contain classes that are compatible with LangChain's core interfaces.
We will cover:
- (Optional) How to bootstrap a new integration package
- How to implement components, such as chat models and vector stores, that adhere to the LangChain interface;
(Optional) bootstrapping a new integration package
In this section, we will outline 2 options for bootstrapping a new integration package, and you're welcome to use other tools if you prefer!
- langchain-cli: This is a command-line tool that can be used to bootstrap a new integration package with a template for LangChain components and Poetry for dependency management.
- Poetry: This is a Python dependency management tool that can be used to bootstrap a new Python package with dependencies. You can then add LangChain components to this package.
Option 1: langchain-cli (recommended)
Option 2: Poetry (manual)
Push your package to a public Github repository
This is only required if you want to publish your integration in the LangChain documentation.
- Create a new repository on GitHub.
- Push your code to the repository.
- Confirm that your repository is viewable by the public (e.g. in a private browsing window, where you're not logged into Github).
Implementing LangChain components
LangChain components are subclasses of base classes in langchain-core. Examples include chat models, vector stores, tools, embedding models and retrievers.
Your integration package will typically implement a subclass of at least one of these components. Expand the tabs below to see details on each.
- Chat models
- Vector stores
- Embeddings
- Tools
- Retrievers
Refer to the Custom Chat Model Guide guide for detail on a starter chat model implementation.
You can start from the following template or langchain-cli command:
langchain-cli integration new \
--name parrot-link \
--name-class ParrotLink \
--src integration_template/chat_models.py \
--dst langchain_parrot_link/chat_models.py
Example chat model code
Your vector store implementation will depend on your chosen database technology.
langchain-core
includes a minimal
in-memory vector store
that we can use as a guide. You can access the code here.
All vector stores must inherit from the VectorStore base class. This interface consists of methods for writing, deleting and searching for documents in the vector store.
VectorStore
supports a variety of synchronous and asynchronous search types (e.g.,
nearest-neighbor or maximum marginal relevance), as well as interfaces for adding
documents to the store. See the API Reference
for all supported methods. The required methods are tabulated below:
Method/Property | Description |
---|---|
add_documents | Add documents to the vector store. |
delete | Delete selected documents from vector store (by IDs) |
get_by_ids | Get selected documents from vector store (by IDs) |
similarity_search | Get documents most similar to a query. |
embeddings (property) | Embeddings object for vector store. |
from_texts | Instantiate vector store via adding texts. |
Note that InMemoryVectorStore
implements some optional search types, as well as
convenience methods for loading and dumping the object to a file, but this is not
necessary for all implementations.
The in-memory vector store is tested against the standard tests in the LangChain Github repository.
Example vector store code
Embeddings are used to convert str
objects from Document.page_content
fields
into a vector representation (represented as a list of floats).
Refer to the Custom Embeddings Guide guide for detail on a starter embeddings implementation.
You can start from the following template or langchain-cli command:
langchain-cli integration new \
--name parrot-link \
--name-class ParrotLink \
--src integration_template/embeddings.py \
--dst langchain_parrot_link/embeddings.py
Example embeddings code
Tools are used in 2 main ways:
- To define an "input schema" or "args schema" to pass to a chat model's tool calling feature along with a text request, such that the chat model can generate a "tool call", or parameters to call the tool with.
- To take a "tool call" as generated above, and take some action and return a response that can be passed back to the chat model as a ToolMessage.
The Tools
class must inherit from the BaseTool base class. This interface has 3 properties and 2 methods that should be implemented in a
subclass.
Method/Property | Description |
---|---|
name | Name of the tool (passed to the LLM too). |
description | Description of the tool (passed to the LLM too). |
args_schema | Define the schema for the tool's input arguments. |
_run | Run the tool with the given arguments. |
_arun | Asynchronously run the tool with the given arguments. |
Properties
name
, description
, and args_schema
are all properties that should be implemented
in the subclass. name
and description
are strings that are used to identify the tool
and provide a description of what the tool does. Both of these are passed to the LLM,
and users may override these values depending on the LLM they are using as a form of
"prompt engineering." Giving these a concise and LLM-usable name and description is
important for the initial user experience of the tool.
args_schema
is a Pydantic BaseModel
that defines the schema for the tool's input
arguments. This is used to validate the input arguments to the tool, and to provide
a schema for the LLM to fill out when calling the tool. Similar to the name
and
description
of the overall Tool class, the fields' names (the variable name) and
description (part of Field(..., description="description")
) are passed to the LLM,
and the values in these fields should be concise and LLM-usable.
Run Methods
_run
is the main method that should be implemented in the subclass. This method
takes in the arguments from args_schema
and runs the tool, returning a string
response. This method is usually called in a LangGraph ToolNode
, and can also be called in a legacy
langchain.agents.AgentExecutor
.
_arun
is optional because by default, _run
will be run in an async executor.
However, if your tool is calling any apis or doing any async work, you should implement
this method to run the tool asynchronously in addition to _run
.
Implementation
You can start from the following template or langchain-cli command:
langchain-cli integration new \
--name parrot-link \
--name-class ParrotLink \
--src integration_template/tools.py \
--dst langchain_parrot_link/tools.py
Example tool code
Retrievers are used to retrieve documents from APIs, databases, or other sources
based on a query. The Retriever
class must inherit from the BaseRetriever base class. This interface has 1 attribute and 2 methods that should be implemented in a subclass.
Method/Property | Description |
---|---|
k | Default number of documents to retrieve (configurable). |
_get_relevant_documents | Retrieve documents based on a query. |
_aget_relevant_documents | Asynchronously retrieve documents based on a query. |
Attributes
k
is an attribute that should be implemented in the subclass. This attribute
can simply be defined at the top of the class with a default value like
k: int = 5
. This attribute is the default number of documents to retrieve
from the retriever, and can be overridden by the user when constructing or calling
the retriever.
Methods
_get_relevant_documents
is the main method that should be implemented in the subclass.
This method takes in a query and returns a list of Document
objects, which have 2
main properties:
page_content
- the text content of the documentmetadata
- a dictionary of metadata about the document
Retrievers are typically directly invoked by a user, e.g. as
MyRetriever(k=4).invoke("query")
, which will automatically call _get_relevant_documents
under the hood.
_aget_relevant_documents
is optional because by default, _get_relevant_documents
will
be run in an async executor. However, if your retriever is calling any apis or doing
any async work, you should implement this method to run the retriever asynchronously
in addition to _get_relevant_documents
for performance reasons.
Implementation
You can start from the following template or langchain-cli command:
langchain-cli integration new \
--name parrot-link \
--name-class ParrotLink \
--src integration_template/retrievers.py \
--dst langchain_parrot_link/retrievers.py
Example retriever code
Next Steps
Now that you've implemented your package, you can move on to testing your integration for your integration and successfully run them.