AI 챗봇Human-in-the-LoopRAGOCR부동산 상담 자동화고객 사례

상담사 5명이 할 일을 AI 1명이 — 등기부등본 AI 상담 자동화 사례

손진호, CEO·2025년 12월 7일·12 min read

"왜 상담사마다 대답이 달라요?"

이 말이 처음 귀에 꽂힌 건 국내 부동산 법률 서비스사 A사와의 킥오프 미팅에서였습니다. 고객이 등기부등본을 들고 오면 권리 분석부터 대출 가능성, 권리침해 여부까지 상담해주는 서비스인데, 고객들이 이 질문을 반복하고 있었습니다.

경영진이 내부 데이터를 뽑아보니 이유가 명확했습니다. 10년 경력 상담사와 6개월 차 신입 상담사가 같은 등기부등본을 놓고 내리는 결론이 달랐습니다. 경험에서 나오는 해석의 차이, 판례 참조 여부, 위험 신호를 짚어내는 감각 — 이게 사람마다 달랐던 겁니다. 더 큰 문제는 숙련 상담사 한 명이 하루에 소화할 수 있는 건수가 한계에 부딪혔다는 것이었습니다.

"숙련 상담사 한 명이 AI의 도움을 받아서 지금보다 훨씬 많은 고객을 일관된 품질로 상담할 수 없을까요?" 이게 A사가 우리에게 던진 질문이었습니다.

기존 방식의 구조적 한계

고객이 등기부등본 사진을 찍어 보내면, 상담사가 직접 읽고 분석합니다. 권리관계가 복잡한 서류일수록 시간이 걸립니다. 상담사 역량에 따라 20분짜리 상담이 1시간이 되기도 했습니다. 숙련 상담사는 더 빠르고 정확하지만 그래서 더 빨리 번아웃됩니다. 그리고 신입 상담사가 실수하면 고객 민원으로 이어졌습니다.

문제는 세 층위에서 동시에 발생했습니다.

역량의 분산: 상담사마다 결과 품질이 달랐고, 고객은 누구에게 연결되느냐에 따라 다른 서비스를 받았습니다
처리량의 천장: 숙련 상담사가 아무리 열심히 해도 하루 상담 건수에 물리적 한계가 있었습니다
지식의 소멸: 상담사가 퇴사하면 그가 쌓아온 노하우가 그대로 증발했습니다

AI 상담 아키텍처: 4개 레이어

우리가 설계한 시스템은 단순한 챗봇이 아닙니다. 등기부등본 원문 파싱부터 법률 지식 검색, AI 응답 생성, 그리고 사람이 개입하는 학습 루프까지 — 4개 레이어가 맞물려 돌아갑니다.

Layer 01

OCR — 원문 파싱

VLM 기반 OCR로 등기부등본 이미지를 파싱. 소유권·근저당·전세권 등 권리관계를 구조화된 데이터로 자동 변환합니다.

Layer 02

RAG — 법률 지식 검색

판례·법령·내부 상담 지식을 Vector DB에 인덱싱. 상담 맥락에 맞는 정보를 시맨틱 검색으로 실시간 참조합니다.

Layer 03

AI 챗봇 — 응답 생성

OCR 추출 데이터 + RAG 검색 결과를 결합해 고객 질의에 즉각 응답. 숙련 상담사 수준의 일관된 답변을 제공합니다.

Layer 04

Human-in-the-Loop (사람 피드백 루프)

AI가 오류·회피 응답을 낼 때 사람이 실시간으로 개입해 보완하는 피드백 구조. 감지 → 상담사 Alert → 직접 응대 → 로직 즉시 업데이트. 쓸수록 똑똑해집니다.

시스템 흐름

D → I → D 루프가 이 시스템의 핵심입니다. AI가 "해당 사항은 전문가와 상담하세요"처럼 회피하는 순간, 그게 자동으로 감지되고 상담사에게 알람이 갑니다. 상담사가 직접 응대한 그 내용이 곧 시스템 학습의 재료가 됩니다.

Human-in-the-Loop가 만들어낸 성장 곡선

처음 배포 직후 AI 상담 품질 내부 벤치마크 점수는 35점이었습니다. 솔직히 이 수준으로는 고객에게 내보낼 수 없습니다. 그런데 이게 맞는 상태입니다. 우리는 이걸 "처음부터 완벽하게"가 아니라 "빠르게 틀리고 빠르게 고치자"는 전략으로 접근했습니다.

점수가 어떻게 산출됐는지 먼저 짚고 넘어가겠습니다. A사 내부 상담사 패널이 아래 4개 카테고리 11개 문항을 기준으로 매 사이클마다 평가했습니다.

평가 카테고리	주요 평가 문항
전반적 만족도	전반적 상담 경험 만족도 · 응답 속도 · 사용 용이성
답변 품질	질의 의도 부합 · 실무 판단 유용성 · 신뢰도 · 일관성 · 이해 용이성
사용 편의성	관리자 대시보드 편의성 · 모호한 질문 대응력
활용 가능성	실고객(공인중개사) 대상 현재 버전 출시 준비 여부

각 문항 5점 척도 → 카테고리별 가중 평균 → 100점 환산. 80점 이상을 실서비스 출시 기준으로 설정.

3개월 동안 180번의 루프가 돌았습니다. AI가 회피하거나 틀린 응답을 할 때마다 상담사가 개입하고, 그 패턴이 시스템에 반영됩니다. 분기마다 점수를 찍었을 때 숫자가 이렇게 움직였습니다.

기간	품질 점수 (내부 벤치마크)	주요 개선 내용
배포 초기	35점	기본 OCR + RAG 연동, 회피 응답 다수
1개월 차	55점	권리관계 해석 로직 보강, 근저당 관련 오답 수정
2개월 차	62점	판례 참조 정확도 향상, 전세권 분쟁 패턴 학습
3개월 차	82점	복합 권리관계 처리, 위험 신호 사전 탐지 추가

내부 평가 기준: 응답 정확도·회피율·고객 만족도 가중 합산 (A사 내부 데이터, 2025.11)

고객응대 품질 점수

3개월 만에 35점에서 82점으로

3개월 · 180번의 Human-in-the-Loop 루프 · 내부 벤치마크 (A사 실측, 2025.11)

점수가 오르는 속도가 후반부로 갈수록 빨라진 이유가 있습니다. 초기에 축적된 엣지케이스들이 모델 개선의 탄탄한 기반이 됐기 때문입니다. 35점짜리 시스템을 너무 일찍 숨기지 말고 통제된 환경에서 빨리 틀리게 해야 82점으로 가는 길이 열립니다.

숫자로 본 결과

항목	도입 전	도입 후	변화
일일 상담 처리 건수 (전체)	기준값	3배 이상	+200%
숙련 상담사 1인 처리 건수	기준값	5배 이상	+400%
고객응대 품질 점수	35점	82점	+47점
상담사 간 응답 편차	높음	낮음	품질 균등화

A사 실측 데이터 기준 (도입 3개월 후, 2025.11)

숙련 상담사의 처리 건수가 5배 늘었다는 건, 그 상담사가 5배 더 일했다는 게 아닙니다. AI가 반복적인 서류 분석과 기본 응답을 대신하면서, 상담사는 진짜 판단이 필요한 케이스에만 집중할 수 있게 된 것입니다. 이게 올바른 방향입니다. AI가 사람을 대체하는 게 아니라, 사람이 더 가치 있는 일을 할 수 있도록 만드는 것.

마무리하며

AI가 등기부등본을 읽고 상담하는 세상이 됐습니다. 3년 전에 이 말을 하면 과장으로 들렸겠지만, 지금 A사 상담사들은 그 시스템 위에서 매일 일하고 있습니다.

제가 이 프로젝트에서 가장 인상 깊었던 건 숫자보다 변화의 방향이었습니다. 처음엔 "AI가 과연 우리 상담 품질을 맞출 수 있을까?" 반신반의하던 숙련 상담사들이, 3개월 뒤에는 "이 케이스는 AI한테 먼저 물어봐야겠어"라고 말하기 시작했습니다. 시스템이 동료로 받아들여지는 순간이었습니다.

앞으로 이런 사례를 더 자주 공유하겠습니다. AI 전환에서 중요한 건 기술 스펙이 아니라, 사람과 시스템이 서로를 개선시키는 루프를 어떻게 설계하느냐라고 저는 생각합니다.

AI 상담 자동화 도입 문의 및 데모 신청: algorithmlabs.ai

"Why Do I Get Different Answers From Different Consultants?"

I first heard this complaint at a kickoff meeting with Client A, a Korean real estate legal service company. Their core offering: customers submit a property registration document (등기부등본), and consultants analyze ownership rights, mortgage exposure, and legal risk. The problem? Customers kept noticing inconsistencies.

When leadership pulled the data, the reason was obvious. A 10-year veteran consultant and a 6-month newcomer would look at the same document and reach different conclusions. Interpretation depth, case precedent awareness, instinct for red flags — it all varied by person. Worse, the senior consultants who delivered the best outcomes had hit a hard ceiling on how many clients they could handle per day.

Their question to us: "Could a senior consultant, with AI support, serve dramatically more clients at consistent quality?"

The Structural Problem

When a customer sends a photo of their property document, a consultant reads and interprets it manually. Complex ownership structures could turn a 20-minute consultation into an hour. Senior consultants were faster and more accurate — and burning out faster because of it. When junior consultants made errors, it became a customer complaint.

The problem operated on three levels simultaneously:

Variance in quality: Outcomes depended on which consultant you happened to reach
Capacity ceiling: Even the best consultants could only handle so many cases per day
Knowledge evaporation: When a consultant left, their accumulated judgment left with them

The AI Architecture: Four Layers

What we built was not a simple chatbot. From raw document parsing through legal knowledge retrieval, AI response generation, and a human learning loop — four layers working in concert.

Layer 01

OCR — Document Parsing

VLM-based OCR extracts raw text from property registration documents. Ownership records, mortgages, and liens are automatically structured as queryable data.

Layer 02

RAG — Legal Knowledge Retrieval

Precedents, statutes, and internal consultation knowledge are indexed in a Vector DB. Semantically relevant information is retrieved in real time for each query.

Layer 03

AI Chatbot — Response Generation

OCR-extracted data combined with RAG results generates immediate, consistent responses at senior consultant quality — without the consultant bottleneck.

Layer 04

Human-in-the-Loop (Human Feedback Loop)

A feedback structure where humans intervene in real time to correct AI errors and evasions. Detect → Alert consultant → Direct response → Immediate logic update. The system gets smarter with every use.

System Flow

The D → I → D loop is the heart of the system. The moment AI responds with something like "please consult a specialist" — evasion detected, consultant alerted. What the consultant says next becomes the system's next training signal.

The Quality Growth Curve Human-in-the-Loop Built

When we first deployed, the internal benchmark quality score was 35. Frankly, that's not production-ready. And that's exactly the point. Our strategy wasn't "get it perfect before launch." It was "fail fast, fix fast."

Before diving into the numbers, here's how the score was calculated. A panel of Client A's internal consultants evaluated the system every cycle across 4 categories and 11 items.

Evaluation Category	Key Items
Overall Satisfaction	Overall consultation experience · Response speed · Ease of use
Answer Quality	Intent alignment · Practical usefulness · Trustworthiness · Consistency · Clarity
Usability	Admin dashboard usability · Handling of ambiguous queries
Readiness	Ready to serve real customers (real estate agents) at current version?

Each item rated on a 5-point scale → weighted average by category → normalized to 100. Score of 80+ defined as production-ready threshold.

Over 3 months, the loop ran 180 times. Every evasion or incorrect response triggered a consultant intervention, which fed back into the system. The score progression told the story:

Period	Quality Score (Internal Benchmark)	Key Improvements
Initial deployment	35	Basic OCR + RAG integration; many evasive responses
Month 1	55	Strengthened rights interpretation logic; fixed mortgage-related errors
Month 2	62	Improved case precedent accuracy; learned dispute patterns
Month 3	82	Complex rights handling; proactive risk signal detection added

Evaluation criteria: weighted composite of response accuracy, evasion rate, and customer satisfaction (Client A internal data, Nov 2025)

Quality Score

35 → 82 in 3 months

3 months · 180 Human-in-the-Loop iterations · Internal benchmark (Client A, Nov 2025)

The improvement rate accelerated in later months because the edge cases accumulated in month 1 became the foundation for more robust improvements in months 2 and 3. The instinct to hide a 35-point system is understandable — but resisting that instinct is what gets you to 82.

Results in Numbers

Metric	Before	After	Change
Total daily consultations	Baseline	3x+	+200%
Senior consultant daily cases	Baseline	5x+	+400%
Quality benchmark score	35	82	+47 points
Consultant-to-consultant variance	High	Low	Equalized quality

Client A measured data — 3 months post-deployment, Nov 2025

When senior consultant throughput increases 5x, that doesn't mean they worked 5x harder. It means AI handled the repetitive document analysis and basic responses, freeing the consultant to focus exclusively on cases that genuinely need human judgment. That's the right direction — not AI replacing people, but AI making people's work more meaningful.

Closing Thoughts

AI that reads property documents and consults on them is no longer a hypothetical. Client A's consultants work on top of this system every day.

What struck me most about this project wasn't the numbers — it was the shift in attitude. Consultants who started out skeptical ("can AI really match our quality?") were saying three months later, "let me check with the AI first on this one." The system had been accepted as a colleague.

There's a broader pattern here I keep seeing across industries: the organizations that build the fastest feedback loop between AI output and human judgment end up with the best systems. The technology is almost secondary. The loop design is everything.

Inquiries and demo requests: algorithmlabs.ai

AI Canvas

업무 영상 하나면, AI가 자동화합니다

470+ 기업이 선택한 GS 인증 1등급 엔터프라이즈 AI 플랫폼. 무료 데모를 통해 귀사에 맞는 자동화 시나리오를 확인하세요.

무료 데모 신청하기