Why AI development in 2026 looks different
Two years ago, AI development meant training your own model or pasting a chatbot into a website. Neither holds up today. LLMs are commodity infrastructure now — the differentiation has moved from the model to the architecture around it: how AI accesses your data, where it touches your existing processes, how decisions stay auditable. AI development in 2026 is an architecture discipline — and that is where we work.
What we mean by AI development. We do not build ChatGPT wrappers. We build custom AI solutions that integrate with your data model, your auth layer, and your audit trail:
- AI agents for defined workflows: multi-step procedures where the AI calls tools, checks intermediate results, and logs each step
- RAG systems on your knowledge: answers come from your documents, databases, and policies — with source citation, not from a vendor's training corpus
- Custom machine learning for your domain: classification, forecasting, anomaly detection — where a generic LLM is too vague
- Aligned with GDPR and nDSG: EU and Swiss hosting, open-source LLMs on request, data minimisation at every layer
- Measurable impact instead of AI theatre: we define before project start which metric must move — handling time, classification accuracy, share of tickets resolved
- Integration with your stack: ERP, CRM, databases, and internal APIs are connected through an auth and audit layer — not via copy-paste
What that looks like in practice: a RAG system over 12,000 internal PDFs replaces Confluence search. A classification model triages incoming service tickets before a human touches them. An AI agent drafts quotes — your team still does the final review. AI development as a tool, not as an end in itself.
Where AI development actually delivers
Not every task needs AI. These five fields have emerged from our practice as the production-grade patterns — architectures we have built, run, and iterated on repeatedly.
AI agents for service and back-office workflows
What AI agents do in 2026 — and what they do not. AI agents handle multi-step procedures in narrowly defined domains: read the request, check it against the knowledge base, call a tool, log the result, escalate to a human when needed. They do not replace a profession — they take the recurring 70 per cent off the desk so your team can give the demanding 30 per cent proper attention.
- Service agent for first responses: categorise requests, answer standard cases directly, hand complex cases — including a pre-check — to your team
- Document agent: extract invoices, delivery notes, and contracts; push them into the ERP; flag deviations
- Sync agent: reconcile records between CRM, ERP, and custom backend — with an audit trail kept revision-safe
- Research agent: market data, competitor updates, lead enrichment — as scheduled runs, not as a live chatbot
- Review agent: check incoming documents against your policies, document findings with source citation — the final approval step stays with a human
Document processing with RAG
In 2026, RAG is the most mature architectural pattern in AI development. Instead of letting the LLM guess, we index your documents in a vector database, retrieve the relevant passages before each answer, and ship source citations along. Hallucinations drop significantly because the model only has to phrase what your sources actually say.
- Make internal knowledge searchable: employees ask the AI instead of digging through SharePoint, Confluence, and old email threads
- Service answers from product docs: the AI grounds itself in your manuals and release notes, not in a vendor training corpus
- Contract and policy research: clauses, rules, and internal guidelines made semantically searchable — with a direct pointer to the location in the source document
- Sales enablement: product details, reference cases, and pricing structure from one source, not from ten parallel slide versions
What changes: response time on internal questions drops from minutes to seconds. Onboarding becomes noticeably easier — new colleagues can ask questions without interrupting anyone. Content stays current because the system reads your originals, not a stale export.
Custom machine learning for domain-specific tasks
When a generic LLM is too vague, custom ML development pays off: your own model, trained on your historical data, with measurable accuracy on your classes. Typically lighter, faster, and cheaper to run than an LLM call per request.
- Forecasting: sales, utilisation, or demand predictions from your historical data
- Classification: assign tickets, emails, and documents to the right category automatically
- Anomaly detection: spot irregularities in transactions, sensor data, or logs before they escalate
- Computer vision: image and video analysis for quality control, stock-taking, and visual inspection
- Specialised NLP: entity extraction, intent detection, sentiment in domains where an LLM stays too generic
LLM integration into existing platforms
When AI should sit inside your application, not next to it. We connect LLMs to your existing auth layer, route every call through your audit trail, and make behaviour controllable via prompts and configuration — without forcing your frontend team to suddenly learn AI engineering.
- Auth and role model respected: what a user is allowed to see, the LLM sees on their behalf — RAG hits from out-of-scope areas are filtered, not censored after the fact
- Audit trail kept revision-safe: prompt, model version, sources, response, and timestamp end up in your logging — traceable on request
- Feature flags for AI capabilities: turn individual features on per tenant, per user group, or per region — a controlled rollout, not a big bang
Semantic search and knowledge base
Full-text search often is not enough. Semantic search finds the right document even when the searcher uses different words than the author. We combine classical full-text search, vector embeddings, and — depending on data volume and accuracy needs — a re-ranking model.
- Similarity search: „Have we seen a case like this before?“ — the platform surfaces comparable tickets, parts, or contracts
- Cross-silo search: one search box, behind it SharePoint, ERP, wiki, and mail archive — results with source and permission filter
- Structured extracts: search results as JSON or tables for downstream processes — one of the underrated strengths of current AI systems
How AI development with us works
An AI project rarely fails on the model — it fails on an unclear use case, thin data, or missing integration. Our approach is cut into five phases, each with a defined deliverable. You can step out after any phase.
1. Use-case discovery and data audit (1–2 weeks)
- Workshop: which procedures consume the most time today, and where is an AI solution genuinely the right lever — rather than just a better script?
- Data audit: is there enough material in sufficient quality, in which format does it sit, and what needs cleanup before any model discussion?
- Impact hypothesis: which metric should move by how much, how do we measure it before project start, and when does the project count as failed?
- Feasibility sketch: is the right answer RAG, a custom ML model, an AI agent — or a noticeably simpler piece of software without AI?
- MVP definition: the smallest cut that delivers real impact, not a demo that only holds up during the pitch
2. Architecture and model selection (1–2 weeks)
Before the first line of code, we decide the tradeoffs together: open-source LLM in your own hands versus API vendor, EU or Swiss hosting, choice of vector database, auth and audit layer. These decisions carry the project for years — so no gut-feel call.
- Model choice: open-source (Llama, Mistral) for data control and cost, commercial APIs (Anthropic, OpenAI) for top language quality — often hybrid, by use case
- Hosting path: EU region (Frankfurt), Switzerland (Zurich), or on-prem — depending on industry, data classification, and nDSG requirements
- Vector database: PostgreSQL pgvector for existing systems, Qdrant or Weaviate at higher volumes
- Integration layer: how the AI accesses your data, where auth runs, where logging happens — before we write code
3. MVP development (typically 4–6 weeks)
Within four to six weeks, a near-production version handles real procedures — not a pitch-only demo. You test on your own data, we measure against the impact hypothesis from phase 1.
- Agent or RAG pipeline: from the first call to the logged response
- Indexing of your documents, embedding strategy, retrieval and re-ranking logic
- For custom ML: data preparation, model selection, validation against a hold-out set
- Connection to at least one target system (ERP, CRM, internal backend) including auth
- Eval suite: defined test cases against which we measure accuracy, latency, and cost
4. Integration into your existing stack
This is where the MVP becomes part of your platform — and this is where most AI projects in the market fall over. We wire the solution into your auth, your audit trail, your secrets management, and your monitoring. No shadow IT.
- Integration with your existing auth (SSO, OIDC, internal identity provider)
- Audit trail kept revision-safe: every AI decision logged with prompt, sources, and model version
- Secrets management: API keys and model access centrally managed, not in code repos
- Monitoring: response times, error rates, cost per request in your existing dashboard (Grafana, Sentry, LangFuse)
5. Operations and continued development
- Operating infrastructure: cloud (EU/Switzerland) or on-prem, depending on requirements — we run both
- Drift monitoring: spotting shifts in input data or response behaviour — before users complain
- Human-in-the-loop: critical decisions still pass through an approval step, documented and traceable
- Iteration with user feedback: prompts, model versions, retrieval strategies evolve from real usage — not from a vacuum
- Hand-off to your team: documentation, training, a clear line between „you maintain“ and „we operate“ — end-to-end responsibility on our side, if you want it that way
What does AI development cost?
An AI project is not a SaaS subscription — it is a clearly scoped build. Three orders of magnitude we see in practice — fixed-price frame or time and materials, depending on how sharply the use case is defined up front.
AI Agent — Entry
from 20,000 €
- One clearly scoped workflow (e.g. service triage, document extraction)
- LLM via API (Anthropic Claude, OpenAI, Google Gemini) or open source
- Connection to 2–3 of your existing tools
- Tool calls, logging, escalation path to a human
- Monitoring on response time, cost per request, error rate
- Documentation and hand-off to your team
- Timeline: 4–6 weeks
Custom AI development
from 50,000 €
- Your own ML model trained on your data, or a specialised RAG system
- Data pipeline with cleanup, enrichment, and versioning
- Eval suite and model assessment against defined accuracy thresholds
- API in your own hands: production-grade, documented, versioned
- RAG systems over your documents — with source citation in the output
- AI agent with multiple tools and multi-step tasks
- Timeline: 8–12 weeks
Enterprise AI platform
from 90,000 €
- Multiple agents or models, orchestrated through your backend layer
- Custom ML components plus RAG plus LLM integration from a single architecture
- Connection to ERP, CRM, and internal databases — with auth and audit layer
- Multi-stage approval workflows, tenant separation, role model
- Drift monitoring, latency and cost tracking, alerting into your existing stack
- Iteration from real usage: prompt versioning, model swaps, retrieval tuning
- Timeline: 12–20 weeks
How we measure impact. Before project start we define a concrete metric — handling time per service ticket, share of tickets resolved without human input, classification accuracy against a hold-out set. For a 20,000 € AI agent project, we work the maths through together: how many hours per month are addressed, what hosting and model calls cost — and when the build pays for itself. If the maths does not work, we do not take the project.
Technology stack for modern AI development
We use mature, production-proven tools — not experiments nobody will maintain next quarter. Which concrete building blocks we pick depends on where your data is allowed to live, which accuracy requirements apply, and how much you want to operate yourself.
LLMs, RAG frameworks, and vector databases
| Open-Source-LLMs und API-Anbieter | Open-source models (Llama, Mistral) sit in your own hands and are the first choice when data must not leave the building. Anthropic Claude, OpenAI ChatGPT, and Google Gemini we use deliberately where language quality or tool use are decisive. Hybrid setups — open source for sensitive procedures, APIs for complex language tasks — are often the pragmatic middle ground in 2026. |
| LangChain und LlamaIndex | Production-tested frameworks for RAG systems and AI agents — tool calls, memory, multi-step pipelines. We do not rebuild every Lego brick ourselves; we use what has hardened in the community — and complement with our own code where things become specific. |
| PostgreSQL pgvector, Qdrant und Weaviate | Vector databases hold embeddings of your documents so a RAG system finds the right sources in milliseconds. PostgreSQL pgvector is enough for many SMB setups and reuses your existing database skills. At higher volumes or with strict isolation we move to Qdrant or Weaviate — both EU-hostable. |
Custom machine learning and operations
| Python-ML-Stack | scikit-learn for classical ML, PyTorch for deep learning, Pandas and NumPy for data preparation. Proven, well documented, large community — and maintainable when the team rotates two years from now. |
| MLOps und Modell-Betrieb | MLflow for experiment versioning, Docker for reproducible deployment, FastAPI as the model API. LangFuse as an observability layer for LLM-based applications — prompt versions, trace inspection, cost per call. Sentry for classical errors, Grafana for dashboards. |
| Hosting in der EU und in der Schweiz | AWS Frankfurt, Azure Germany/Switzerland, Google Zurich, Hetzner, IONOS, Exoscale, and pure on-prem scenarios — all paths we have already walked. Which option fits is decided by data classification and nDSG requirement, not by gut feeling. |
How we choose — and what we leave out
Open source versus API. For sensitive data, high volumes, or real data-sovereignty requirements, a self-hosted open-source LLM almost always wins. For rare, linguistically demanding requests, a commercial provider delivers more per euro. We recommend the path per use case, not per belief.
Frameworks versus your own code. LangChain and LlamaIndex save weeks — as long as your use case stays inside the intended patterns. The moment custom logic is needed (own escalation paths, unusual tool calls, multiple tenants), we lift parts out and write them ourselves. Mix, not religion.
Managed service versus your own infrastructure. Managed services (AWS Bedrock, Vertex AI) are fast to set up but cost data sovereignty and carry vendor risk. Running it yourself is more expensive at first and often cheaper and more independent in the long run. We set up what fits your risk and operations stance — and say so plainly when both options are valid.
Specialised stacks for particular needs
Our standard stack covers roughly 80 per cent of mid-market AI projects. For the remaining 20 per cent, we reach for the following options:
| R und statistische Modellierung | If your team already works in R — in regulated areas or classical statistics — we build the model logic there and expose it through a production-grade interface. |
| On-Device- und Edge-KI | For mobile apps, IoT devices, or scenarios without stable cloud connectivity, we deploy compact models directly on the device — low latency, no recurring API cost, clear data sovereignty. |
| Fine-Tuning und Custom-Transformer | When a standard LLM stays too vague in your domain: targeted fine-tuning or custom transformer variants — appropriate when data and volume justify the effort. |
AI development GDPR- and nDSG-compliant
An AI system changes the compliance picture in two ways: first, data flows through an additional model that interprets. Second, the <a href="/en/blog/ai-act-ki-verordnung-software-architektur"><strong>EU AI Act</strong></a> has been entering into force in stages since August 2024 — obligations for GPAI models from August 2025, full applicability from August 2026. We account for both layers from the start.
Data minimisation and protection of personal data
- Only the data actually needed: AI agents and RAG systems access exclusively explicitly cleared sources — checked at the auth layer, not in the prompt
- Pseudonymisation before the model: personal data is removed or pseudonymised before training or inference, where the use case allows
- Retention rules: prompts, responses, and embeddings are deleted after defined periods — automated, documented, auditable
- Encryption: data at rest (AES-256) and in transit (TLS 1.3) encrypted — standard, not up for debate
Hosting in the EU and Switzerland
- EU region and Switzerland: AWS Frankfurt, Azure Germany and Switzerland, Google Zurich, alternatively Hetzner, IONOS, or Exoscale — data and models stay in your own hands
- On-prem deployment: open-source LLMs and custom ML models run in your data centre when data class or industry obligations demand it
- Open-source LLMs: Llama or Mistral — you fully control weights, updates, and inference path
- Without hyperscaler dependency: on request, fully without US hyperscalers — often the clean choice with sensitive data
AI Act, audit trail, and explainability
- Audit trail kept revision-safe: prompt, model version, source hits, response, and timestamp for every AI call — one of the most important architectural decisions in a production-grade AI system
- Human oversight for high-risk applications: where the AI Act requires it, every automated decision passes through a documented approval step — no black-box automation
- Explainability of the decision: for every answer, the sources and rules that led to the result are visible
- GDPR Art. 22 — right to explanation: for automated decisions with legal effect, those affected receive a traceable explanation — the architecture provides for it, rather than reconstructing it after the fact
Measuring impact, controlling latency and cost
Three dimensions decide whether an AI system holds up in production: response time, accuracy, and cost per request. We treat all three explicitly — with numbers you can measure the project against three months in.
Response time
- Quantised models: smaller bit widths (4-bit, 8-bit) reduce inference time markedly, with a moderate accuracy trade-off — we assess size and suitability per use case
- Response caching: recurring questions with identical context come from the cache instead of bothering the model again
- Streaming responses: tokens flow as soon as they are produced — the user sees that the system is working immediately
- Edge deployment where it fits: compact models directly on site or on device — low latency, clear data sovereignty, no API cost per call
Cost per request
- Routing per request type: simple requests to a smaller open-source model, demanding ones to Claude or Gemini — the mix is markedly cheaper than „everything via the most expensive API“
- Prompt discipline: shorter, structured prompts without ballast lower token cost and improve accuracy at the same time
- Self-hosted LLMs: from moderate volume onwards, running it yourself is usually cheaper and more independent than API calls
- Batch processing: collect non-time-critical tasks and process them in batches instead of real time — lower peak load, lower cost
Scaling and drift
- Horizontal scaling: multiple inference instances behind a load balancer — important when the AI system becomes part of a real-time product
- Auto-scaling: instances spin up during peaks and shut down again in quieter periods
- Queue-based architecture: requests decoupled through a queue (e.g. RabbitMQ, NATS) — protects against load spikes and simplifies retry logic
- Drift monitoring: input data and response behaviour shift over time — we measure actively (with LangFuse, Sentry, for example) and react before users notice. More hosting paths in Cloud Services.
Integration into your existing IT landscape
An AI solution creates impact only when it lives inside the data flows people already work with. We connect AI to your existing systems — through auth, audit trail, and secrets management, not through copy-paste or a browser plugin.
- ERP systems: SAP, Microsoft Dynamics, Odoo, proAlpha — read and write through official APIs, respecting the permission model
- CRM integration: Salesforce, HubSpot, Pipedrive — enriching customer data, lead scoring, research agent
- Communication tools: Slack, Microsoft Teams, email — AI capabilities where your team already works, not in yet another tool
- Databases: PostgreSQL, MongoDB, MS SQL, Snowflake — direct read path for RAG, with row- and field-level permissions
- Document stores: SharePoint, Google Drive, S3, Nextcloud — indexing with permission filter, so RAG only surfaces what the user is allowed to see
- Your own APIs: internal services via REST or GraphQL — versioned, documented, with the auth chain running from the AI through to the target system
Maintenance and continued development
An AI system does not end with go-live. Models, prompts, and data age. Our maintenance model keeps the solution dependable over years — with clear responsibilities between you and us.
- Drift and accuracy monitoring: we continuously measure whether input data and response behaviour are shifting — and alert before users complain
- Model refresh: for custom ML, periodic re-training on updated data; for LLMs, a structured switch to new model versions with eval comparison
- Prompt versioning: prompts are versioned like code, tested, and rolled back when needed — no silent changes in production
- Cost and latency monitoring: calls per day, average response time, cost per request — visible in your dashboard, not in a spreadsheet
- RLHF and iteration from real usage: thumbs-up/-down feedback and corrections flow back into the system in a structured way, improving answer quality over time
More on the operations and maintenance model: Maintenance & Support.
Why IntegrIT for mid-market AI development?
- Senior engineering, no junior pool: you talk to engineers who have been shipping software for years — not to an account layer that just forwards tickets
- AI solutions as an architecture discipline: we understand LLMs, vector databases, and agent pipelines — and equally the auth, audit, and data-model layer all of that has to embed into
- First near-production version in 4–6 weeks: we build for real impact, not for the demo at the steering committee in two months
- Aligned with GDPR, nDSG, and the EU AI Act: hosting in the EU or Switzerland, audit trail kept revision-safe, risk class assessed per use case
- Impact before kickoff: we define the success metric with you up front — and say „no“ honestly when a use case does not pay off
- Code, models, and data in your own hands: none of it sits with us or a sub-vendor — you can end the contract at any time without losing data
- End-to-end responsibility across the platform: we also build your backend and your apps — AI as an integral part of your architecture, instead of three vendors arguing over interfaces
Next step: framing AI development together
Send us a short note describing what you have in mind — informally to development@integritsol.de, or via the Calendly link below. We reply within one working day and set up a first conversation. You get an honest assessment, not a sales tour.
First conversation about your AI development project
30 to 60 minutes, no obligation. We go through one or two use cases, look at data situation and architecture, and tell you directly whether an AI solution is the right path — or whether a simpler piece of software is enough.
Or call directly: +49 1522 3635395