A chatbot designed for customer support will typically contain relevant context about the conversation, such as order details and a summary of the conversation so far, as well as the most recent messages. This use case will require a few thousand examples to ensure that the chatbot can handle different types of requests and customer issues. To ensure high-quality performance, it is important to vet the conversation samples to ensure the quality of the agent messages.
One way to organize the dataset for this use case is to have multiple rows for each past conversation, each time with a slightly different context for every agent generation as a completion. The dataset could look like this:
{"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent:", "completion":" <response2>\n"} {"prompt":"Summary: <summary of the interaction so far>\n\nSpecific information:<for example order details in natural language>\n\n###\n\nCustomer: <message1>\nAgent: <response1>\nCustomer: <message2>\nAgent: <response2>\nCustomer: <message3>\nAgent:", "completion":" <response3>\n"}
The summary of the conversation can be generated with a separate text transformation fine-tuned model. This will allow the chatbot to provide a concise summary of the conversation so far, making it easier for the customer to understand the context of their request.
When creating the dataset, it is important to consider the various types of requests that customers may have. These can include inquiries about the status of an order, reporting an issue with a product, or requesting a refund. It is also important to consider the different ways that customers may phrase their requests and to include a variety of different customer messages in the dataset.
For example:
Example 1:
{"prompt":"Summary: Customer is inquiring about the status of their recent order.\n\nSpecific information: Order number 12345, placed on January 1st.\n\n###\n\nCustomer: Hi, can you tell me the status of my order?\nAgent: <response1>\nCustomer: Thanks, can you also tell me when it will be delivered?\nAgent:", "completion":" <response2>\n"}
Example 2:
{"prompt":"Summary: Customer is reporting an issue with a product they received.\n\nSpecific information: Order number 54321, received on December 15th.\n\n###\n\nCustomer: I received my order but the product is damaged.\nAgent: <response1>\nCustomer: Can you send me a replacement?\nAgent:", "completion":" <response2>\n"}
Example 3:
{"prompt":"Summary: Customer is requesting a refund for an item.\n\nSpecific information: Order number 11111, placed on November 1st.\n\n###\n\nCustomer: I would like to request a refund for this item.\nAgent: <response1>\nCustomer: Can you provide me the details on how to process the refund?\nAgent:", "completion":" <response2>\n"}
It is also important to note that the actual responses generated by the chatbot will be based on the dataset and the training of the model. Therefore, it is essential to continuously update and improve the dataset to ensure the chatbot’s performance is of high quality.
Additionally, when creating the dataset, it’s important to consider the different languages and cultures of the customers and create a diverse dataset that can handle different languages, accents and terminologies.
In conclusion, creating a high-quality dataset is crucial for the performance of a customer support chatbot. It’s important to consider the different types of requests customers may have, the different ways they may phrase their requests and the various languages and cultures of the customers. By organizing the dataset in a structured manner, and continuously updating and improving it, the chatbot can provide accurate and efficient responses to customer inquiries.