Gracias por enviar su consulta! Uno de los miembros de nuestro equipo se pondrá en contacto con usted en breve.
Gracias por enviar su reserva! Uno de los miembros de nuestro equipo se pondrá en contacto con usted en breve.
Programa del Curso
Introduction to Multimodal AI and Ollama
- Overview of multimodal learning
- Key challenges in vision-language integration
- Capabilities and architecture of Ollama
Setting Up the Ollama Environment
- Installing and configuring Ollama
- Working with local model deployment
- Integrating Ollama with Python and Jupyter
Working with Multimodal Inputs
- Text and image integration
- Incorporating audio and structured data
- Designing preprocessing pipelines
Document Understanding Applications
- Extracting structured information from PDFs and images
- Combining OCR with language models
- Building intelligent document analysis workflows
Visual Question Answering (VQA)
- Setting up VQA datasets and benchmarks
- Training and evaluating multimodal models
- Building interactive VQA applications
Designing Multimodal Agents
- Principles of agent design with multimodal reasoning
- Combining perception, language, and action
- Deploying agents for real-world use cases
Advanced Integration and Optimization
- Fine-tuning multimodal models with Ollama
- Optimizing inference performance
- Scalability and deployment considerations
Summary and Next Steps
Requerimientos
- Strong understanding of machine learning concepts
- Experience with deep learning frameworks such as PyTorch or TensorFlow
- Familiarity with natural language processing and computer vision
Audience
- Machine learning engineers
- AI researchers
- Product developers integrating vision and text workflows
21 Horas