The World of Endless Opportunities Powered by Large Action Models

June 3, 2024

By: Hakimuddin Bawangaonwala, Senior Consultant, GTO

This decade has been dominated by AI advancements. Generative AI took center stage in late 2022, and by 2023, it had made significant inroads into the business realm. While a large language model (LLM) addresses many business problems, there’s a pressing need for the ability to perform tasks and make decisions autonomously. This potential is seen in large action models (LAMs). Organizations and research institutions are actively exploring the seamless integration of LAMs into our daily lives.

Large Action Models (LAMs) are derived from large language models (LLMs) and serve as an extension of LLMs by transforming them into autonomous agents. These software units can execute tasks and make decisions without human intervention. Instead of simply responding to user queries, LAMs utilize the linguistic proficiency of LLMs to carry out tasks and decision-making processes independently.

LAMs leverage multimodal data from different sources, such as text, images, audio, and more, to effectively simulate various applications and human actions. This eliminates the need for temporary demonstrations or textual explanations. By integrating such modalities, LAMs can better understand complex real-world situations, enabling numerous applications across various sectors. In this blog, we will discuss some potential applications of LAMs.

Potential application areas for LAM

Manufacturing

LAMs can be used in software-driven vehicles for independent actions. These vehicles depend on sensors, cameras, LIDAR (Light Detection and Ranging), radar, and various data sources to comprehend their surroundings and make instantaneous judgments. By integrating data from diverse modalities and Vision-Language Navigation (VLN), AI systems can precisely identify objects, workers, pickup and drop-off points, and other vital components of the driving environment, facilitating secure and efficient transportation.

Media and Entertainment

Scene synthesis is crucial for creating immersive environments. It involves various tasks such as creating three-dimensional (3D) scenes, designing terrains, placing objects, implementing realistic lighting, and incorporating dynamic weather systems. To create vast open-world environments in modern movies, LAMs powered by large foundation models can assist scene designers by devising unique landscape design rules that align with their preferences and the scene’s requirements. This ensures semantic consistency and variability in the generated assets, preventing repetitive patterns.

Healthcare

In the medical field, model hallucinations can pose risks, potentially resulting in severe harm or fatalities for patients. Thus, large action models in healthcare that are trained on extensive web and clinical trial data can serve as dependable knowledge retrieval or text generation-based retrieval systems. They understand diverse languages, cultures, and health conditions, creating a formidable medical knowledge base. Combining healthcare professionals with medical knowledge retrieval agents can minimize hallucinations and enhance the accuracy and precision of responses.

Retail

LAM can seamlessly link the customer’s profile and digital wallet before they enter the store. It leverages data from sensors and cameras to identify and track individuals and guide them to the shelves according to their buying lists. These actions are recorded in a virtual shopping cart. The system accurately places items into the virtual shopping cart by comparing the product images on video with the retailer’s database. Once shopping is completed and the product list is finalized, the customer can leave the store. As they exit the area monitored by the cameras, the computer vision technology recognizes this as the end of the shopping session. Subsequently, the system calculates the total cost of the items and deducts it from the customer’s digital wallet.

BFS

LAMs in banking can be used for facial recognition to enhance customer authentication in various areas, such as mobile apps, online banking, and ATM transactions. It analyzes text-based queries, document images, voice commands, and APIs to suggest the best course of action for customers. Depending on user history, LAM learns the necessary information and provides the best recommendations. As LAMs exhibit a human-like thinking process and comprehend user intent, banks can adopt LAMs to integrate multiple applications, seamlessly replacing robotic process automation. By examining customer behavior across different online and mobile banking channels, LAM enables the identification of irregular patterns, such as unexpected transactions or login anomalies, triggering alerts for potentially fraudulent activities.

Robotics

LAM systems integrate advanced foundation model technologies as encoders to process input information, enabling robots to execute actions based on linguistic instructions and visual cues. These systems also possess advanced language processing capabilities, allowing them to interpret instructions and break them down into sequential robot action steps, enhancing task-planning technologies.

Conclusion

Although generative AI models are gaining popularity, most still struggle to orchestrate various data classes. Developers can use the knowledge of general-purpose foundation models and create a neural network over the foundation models to orchestrate multiple data classes initially. As the model gets trained over time, it can be deployed in new scenarios to adopt better collaboration between humans and the model.

Significant work is being carried out towards interlacing human interaction with the LAM knowledge interface—effectively simulating the combination of different applications and human actions performed on them. Advances in neuro-symbolic programming make this capability possible.

LAM combines different input modes and changes how intelligent systems operate, enabling them to comprehend and engage with the environment in ways that mimic human behavior. The potential uses of LAM span various sectors, from manufacturing to healthcare and image search, providing innovative solutions to intricate problems in different fields. With ongoing advancements in research, we anticipate witnessing a plethora of creative applications and significant advancements in the coming years.

Hakimuddin Bawangaonwala

Senior Consultant, GTO

Hakimuddin is a seasoned consultant with over four years of experience in the industry. Specializing in investigating beyond-the-horizon technologies, Hakimuddin has worked on a diverse range of technologies, helping organizations create use cases for quick incubation and industrialization. Hakimuddin holds a master’s degree in design engineering and has published numerous articles and whitepapers on emerging technologies. In addition to consulting, Hakimuddin enjoys collaborating on deep research projects and contributing to the community.

Blogs

The World of Endless Opportunities Powered by Large Action Models

Potential application areas for LAM

Conclusion

Blogger's Profile

Hakimuddin Bawangaonwala

More from Hakimuddin Bawangaonwala

Latest Blogs

Contact us

Blogs

Potential application areas for LAM

Conclusion

Blogger's Profile

Hakimuddin Bawangaonwala

More from Hakimuddin Bawangaonwala

Latest Blogs