Error Recovery Example

This example demonstrates advanced error recovery patterns and resilient bot design using BotRS.

Overview

Building a robust QQ Guild bot requires comprehensive error handling and recovery mechanisms. This example shows how to implement retry logic, circuit breakers, graceful degradation, and automatic recovery strategies.

Basic Error Recovery

Retry with Exponential Backoff

rust

use botrs::{Client, Context, EventHandler, Intents, Message, Ready, Token, BotError};
use std::time::Duration;
use tokio::time::sleep;

struct ResilientBot {
    max_retries: u32,
    base_delay: Duration,
}

impl ResilientBot {
    pub fn new() -> Self {
        Self {
            max_retries: 3,
            base_delay: Duration::from_millis(1000),
        }
    }

    async fn retry_with_backoff<F, T>(&self, mut operation: F) -> Result<T, BotError>
    where
        F: FnMut() -> std::pin::Pin<Box<dyn std::future::Future<Output = Result<T, BotError>> + Send>>,
    {
        let mut last_error = None;
        
        for attempt in 0..=self.max_retries {
            match operation().await {
                Ok(result) => return Ok(result),
                Err(error) => {
                    last_error = Some(error);
                    
                    if attempt < self.max_retries {
                        let delay = self.base_delay * 2_u32.pow(attempt);
                        tracing::warn!(
                            "Operation failed (attempt {}/{}), retrying in {:?}",
                            attempt + 1,
                            self.max_retries + 1,
                            delay
                        );
                        sleep(delay).await;
                    }
                }
            }
        }
        
        Err(last_error.unwrap())
    }
}

#[async_trait::async_trait]
impl EventHandler for ResilientBot {
    async fn ready(&self, _ctx: Context, ready: Ready) {
        tracing::info!("Resilient bot ready: {}", ready.user.username);
    }

    async fn message_create(&self, ctx: Context, message: Message) {
        if message.is_from_bot() {
            return;
        }

        if let Some(content) = &message.content {
            if content.trim() == "!test-retry" {
                let result = self.retry_with_backoff(|| {
                    let ctx = ctx.clone();
                    let message = message.clone();
                    Box::pin(async move {
                        message.reply(&ctx.api, &ctx.token, "This might fail but we'll retry!").await
                    })
                }).await;

                match result {
                    Ok(_) => tracing::info!("Message sent successfully after retries"),
                    Err(e) => tracing::error!("Failed to send message after all retries: {}", e),
                }
            }
        }
    }

    async fn error(&self, error: BotError) {
        tracing::error!("Bot error occurred: {}", error);
        
        // Implement error-specific recovery strategies
        match &error {
            BotError::Network(_) => {
                tracing::info!("Network error detected, implementing recovery strategy");
                // Network-specific recovery logic
            }
            BotError::RateLimited(_) => {
                tracing::info!("Rate limit hit, backing off");
                // Rate limit recovery logic
            }
            BotError::Gateway(_) => {
                tracing::info!("Gateway error, preparing for reconnection");
                // Gateway recovery logic
            }
            _ => {
                tracing::warn!("Unhandled error type: {}", error);
            }
        }
    }
}

Circuit Breaker Pattern

rust

use std::sync::Arc;
use tokio::sync::Mutex;
use std::time::{Duration, Instant};

#[derive(Debug, Clone)]
pub enum CircuitState {
    Closed,
    Open(Instant),
    HalfOpen,
}

pub struct CircuitBreaker {
    state: Arc<Mutex<CircuitState>>,
    failure_threshold: u32,
    recovery_timeout: Duration,
    failure_count: Arc<Mutex<u32>>,
}

impl CircuitBreaker {
    pub fn new(failure_threshold: u32, recovery_timeout: Duration) -> Self {
        Self {
            state: Arc::new(Mutex::new(CircuitState::Closed)),
            failure_threshold,
            recovery_timeout,
            failure_count: Arc::new(Mutex::new(0)),
        }
    }

    pub async fn call<F, T>(&self, operation: F) -> Result<T, BotError>
    where
        F: std::future::Future<Output = Result<T, BotError>>,
    {
        // Check circuit state
        let mut state = self.state.lock().await;
        match *state {
            CircuitState::Open(opened_at) => {
                if Instant::now().duration_since(opened_at) > self.recovery_timeout {
                    *state = CircuitState::HalfOpen;
                    tracing::info!("Circuit breaker transitioning to half-open");
                } else {
                    return Err(BotError::InternalError("Circuit breaker is open".to_string()));
                }
            }
            CircuitState::Closed | CircuitState::HalfOpen => {}
        }
        drop(state);

        // Execute operation
        match operation.await {
            Ok(result) => {
                self.on_success().await;
                Ok(result)
            }
            Err(error) => {
                self.on_failure().await;
                Err(error)
            }
        }
    }

    async fn on_success(&self) {
        let mut failure_count = self.failure_count.lock().await;
        *failure_count = 0;

        let mut state = self.state.lock().await;
        if matches!(*state, CircuitState::HalfOpen) {
            *state = CircuitState::Closed;
            tracing::info!("Circuit breaker closed after successful operation");
        }
    }

    async fn on_failure(&self) {
        let mut failure_count = self.failure_count.lock().await;
        *failure_count += 1;

        if *failure_count >= self.failure_threshold {
            let mut state = self.state.lock().await;
            *state = CircuitState::Open(Instant::now());
            tracing::warn!("Circuit breaker opened after {} failures", self.failure_threshold);
        }
    }
}

Graceful Degradation

rust

use std::collections::HashMap;

pub struct FeatureFlags {
    flags: HashMap<String, bool>,
}

impl FeatureFlags {
    pub fn new() -> Self {
        let mut flags = HashMap::new();
        flags.insert("rich_embeds".to_string(), true);
        flags.insert("file_uploads".to_string(), true);
        flags.insert("interactive_buttons".to_string(), true);
        flags.insert("external_api_calls".to_string(), true);
        
        Self { flags }
    }

    pub fn is_enabled(&self, feature: &str) -> bool {
        self.flags.get(feature).copied().unwrap_or(false)
    }

    pub fn disable_feature(&mut self, feature: &str) {
        tracing::warn!("Disabling feature: {}", feature);
        self.flags.insert(feature.to_string(), false);
    }

    pub fn enable_feature(&mut self, feature: &str) {
        tracing::info!("Enabling feature: {}", feature);
        self.flags.insert(feature.to_string(), true);
    }
}

struct DegradableBot {
    circuit_breaker: CircuitBreaker,
    feature_flags: Arc<Mutex<FeatureFlags>>,
}

impl DegradableBot {
    pub fn new() -> Self {
        Self {
            circuit_breaker: CircuitBreaker::new(5, Duration::from_secs(60)),
            feature_flags: Arc::new(Mutex::new(FeatureFlags::new())),
        }
    }

    async fn send_message_with_fallback(&self, ctx: &Context, message: &Message, content: &str) -> Result<(), BotError> {
        let flags = self.feature_flags.lock().await;
        
        // Try rich embed first
        if flags.is_enabled("rich_embeds") {
            drop(flags);
            let result = self.circuit_breaker.call(
                self.send_rich_message(ctx, message, content)
            ).await;
            
            if result.is_ok() {
                return result;
            }
            
            // Disable rich embeds on failure
            let mut flags = self.feature_flags.lock().await;
            flags.disable_feature("rich_embeds");
        }

        // Fallback to simple text message
        tracing::info!("Falling back to simple text message");
        message.reply(&ctx.api, &ctx.token, content).await
    }

    async fn send_rich_message(&self, ctx: &Context, message: &Message, content: &str) -> Result<(), BotError> {
        use botrs::models::message::{Embed, MessageParams};
        
        let embed = Embed {
            title: Some("Response".to_string()),
            description: Some(content.to_string()),
            color: Some(0x3498db),
            ..Default::default()
        };

        let params = MessageParams {
            embed: Some(embed),
            ..Default::default()
        };

        ctx.send_message(&message.channel_id, &params).await?;
        Ok(())
    }
}

Health Monitoring

rust

use std::sync::atomic::{AtomicU64, Ordering};
use chrono::{DateTime, Utc};

#[derive(Debug)]
pub struct HealthMetrics {
    pub total_messages: AtomicU64,
    pub successful_messages: AtomicU64,
    pub failed_messages: AtomicU64,
    pub last_error: Mutex<Option<(DateTime<Utc>, String)>>,
    pub uptime_start: DateTime<Utc>,
}

impl HealthMetrics {
    pub fn new() -> Self {
        Self {
            total_messages: AtomicU64::new(0),
            successful_messages: AtomicU64::new(0),
            failed_messages: AtomicU64::new(0),
            last_error: Mutex::new(None),
            uptime_start: Utc::now(),
        }
    }

    pub fn record_success(&self) {
        self.total_messages.fetch_add(1, Ordering::Relaxed);
        self.successful_messages.fetch_add(1, Ordering::Relaxed);
    }

    pub async fn record_failure(&self, error: &str) {
        self.total_messages.fetch_add(1, Ordering::Relaxed);
        self.failed_messages.fetch_add(1, Ordering::Relaxed);
        
        let mut last_error = self.last_error.lock().await;
        *last_error = Some((Utc::now(), error.to_string()));
    }

    pub fn success_rate(&self) -> f64 {
        let total = self.total_messages.load(Ordering::Relaxed);
        if total == 0 {
            return 1.0;
        }
        
        let successful = self.successful_messages.load(Ordering::Relaxed);
        successful as f64 / total as f64
    }

    pub fn is_healthy(&self) -> bool {
        let success_rate = self.success_rate();
        let total_messages = self.total_messages.load(Ordering::Relaxed);
        
        // Consider healthy if success rate > 95% and we've processed some messages
        success_rate > 0.95 || total_messages < 10
    }
}

struct MonitoredBot {
    health_metrics: Arc<HealthMetrics>,
    circuit_breaker: CircuitBreaker,
}

impl MonitoredBot {
    pub fn new() -> Self {
        Self {
            health_metrics: Arc::new(HealthMetrics::new()),
            circuit_breaker: CircuitBreaker::new(5, Duration::from_secs(30)),
        }
    }

    async fn handle_message_safely(&self, ctx: &Context, message: &Message) {
        let result = self.circuit_breaker.call(
            self.process_message(ctx, message)
        ).await;

        match result {
            Ok(_) => {
                self.health_metrics.record_success();
                tracing::debug!("Message processed successfully");
            }
            Err(e) => {
                self.health_metrics.record_failure(&e.to_string()).await;
                tracing::error!("Message processing failed: {}", e);
            }
        }
    }

    async fn process_message(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        if let Some(content) = &message.content {
            match content.trim() {
                "!health" => {
                    self.send_health_report(ctx, message).await?;
                }
                "!status" => {
                    self.send_status_report(ctx, message).await?;
                }
                _ => {
                    // Process other commands
                }
            }
        }
        Ok(())
    }

    async fn send_health_report(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        let metrics = &self.health_metrics;
        let total = metrics.total_messages.load(Ordering::Relaxed);
        let successful = metrics.successful_messages.load(Ordering::Relaxed);
        let failed = metrics.failed_messages.load(Ordering::Relaxed);
        let success_rate = metrics.success_rate();
        let uptime = Utc::now() - metrics.uptime_start;
        
        let health_status = if metrics.is_healthy() { "🟢 Healthy" } else { "🔴 Unhealthy" };
        
        let last_error = {
            let last_error_guard = metrics.last_error.lock().await;
            match &*last_error_guard {
                Some((timestamp, error)) => format!("**Last Error:** {} ({})", error, timestamp.format("%Y-%m-%d %H:%M:%S UTC")),
                None => "**Last Error:** None".to_string(),
            }
        };

        let report = format!(
            "🏥 **Bot Health Report**\n\n\
             **Status:** {}\n\
             **Uptime:** {} days, {} hours, {} minutes\n\
             **Messages Processed:** {}\n\
             **Success Rate:** {:.2}%\n\
             **Successful:** {}\n\
             **Failed:** {}\n\n\
             {}",
            health_status,
            uptime.num_days(),
            uptime.num_hours() % 24,
            uptime.num_minutes() % 60,
            total,
            success_rate * 100.0,
            successful,
            failed,
            last_error
        );

        message.reply(&ctx.api, &ctx.token, &report).await?;
        Ok(())
    }

    async fn send_status_report(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        let status = if self.health_metrics.is_healthy() {
            "✅ All systems operational"
        } else {
            "⚠️ Some issues detected - check !health for details"
        };

        message.reply(&ctx.api, &ctx.token, status).await?;
        Ok(())
    }
}

Automatic Recovery Strategies

rust

use tokio::sync::oneshot;

pub struct RecoveryManager {
    recovery_channel: Option<oneshot::Sender<()>>,
}

impl RecoveryManager {
    pub fn new() -> Self {
        Self {
            recovery_channel: None,
        }
    }

    pub async fn start_recovery_monitoring(&mut self) {
        let (tx, mut rx) = oneshot::channel();
        self.recovery_channel = Some(tx);

        tokio::spawn(async move {
            loop {
                tokio::select! {
                    _ = &mut rx => {
                        tracing::info!("Recovery monitoring stopped");
                        break;
                    }
                    _ = tokio::time::sleep(Duration::from_secs(30)) => {
                        // Perform periodic health checks
                        Self::perform_health_check().await;
                    }
                }
            }
        });
    }

    async fn perform_health_check() {
        tracing::debug!("Performing automated health check");
        
        // Check various system components
        let network_ok = Self::check_network_connectivity().await;
        let memory_ok = Self::check_memory_usage().await;
        let api_ok = Self::check_api_connectivity().await;

        if !network_ok || !memory_ok || !api_ok {
            tracing::warn!("Health check failed, initiating recovery procedures");
            Self::initiate_recovery().await;
        }
    }

    async fn check_network_connectivity() -> bool {
        // Simple network check
        match reqwest::get("https://api.sgroup.qq.com").await {
            Ok(response) => response.status().is_success(),
            Err(_) => false,
        }
    }

    async fn check_memory_usage() -> bool {
        // Basic memory check (simplified)
        true // In a real implementation, check actual memory usage
    }

    async fn check_api_connectivity() -> bool {
        // Check if we can make basic API calls
        true // In a real implementation, test API endpoints
    }

    async fn initiate_recovery() {
        tracing::info!("Starting automatic recovery procedures");
        
        // Clear caches
        Self::clear_caches().await;
        
        // Reset connections
        Self::reset_connections().await;
        
        // Reduce load
        Self::reduce_load().await;
        
        tracing::info!("Recovery procedures completed");
    }

    async fn clear_caches() {
        tracing::debug!("Clearing internal caches");
        // Implementation depends on your caching strategy
    }

    async fn reset_connections() {
        tracing::debug!("Resetting network connections");
        // Close and reopen connections if needed
    }

    async fn reduce_load() {
        tracing::debug!("Reducing system load");
        // Temporarily disable non-essential features
    }
}

Complete Recovery Bot Example

rust

struct AdvancedRecoveryBot {
    health_metrics: Arc<HealthMetrics>,
    circuit_breaker: CircuitBreaker,
    feature_flags: Arc<Mutex<FeatureFlags>>,
    recovery_manager: Arc<Mutex<RecoveryManager>>,
}

impl AdvancedRecoveryBot {
    pub fn new() -> Self {
        Self {
            health_metrics: Arc::new(HealthMetrics::new()),
            circuit_breaker: CircuitBreaker::new(5, Duration::from_secs(60)),
            feature_flags: Arc::new(Mutex::new(FeatureFlags::new())),
            recovery_manager: Arc::new(Mutex::new(RecoveryManager::new())),
        }
    }

    pub async fn start_monitoring(&self) {
        let mut recovery_manager = self.recovery_manager.lock().await;
        recovery_manager.start_recovery_monitoring().await;
    }
}

#[async_trait::async_trait]
impl EventHandler for AdvancedRecoveryBot {
    async fn ready(&self, _ctx: Context, ready: Ready) {
        tracing::info!("Advanced recovery bot ready: {}", ready.user.username);
        
        // Start health monitoring
        self.start_monitoring().await;
    }

    async fn message_create(&self, ctx: Context, message: Message) {
        if message.is_from_bot() {
            return;
        }

        // Wrap all message processing in error recovery
        let result = self.circuit_breaker.call(async {
            self.process_message_with_recovery(&ctx, &message).await
        }).await;

        // Record metrics
        match result {
            Ok(_) => self.health_metrics.record_success(),
            Err(e) => {
                self.health_metrics.record_failure(&e.to_string()).await;
                tracing::error!("Message processing failed: {}", e);
            }
        }
    }

    async fn error(&self, error: BotError) {
        tracing::error!("Bot error: {}", error);
        
        // Record the error
        self.health_metrics.record_failure(&error.to_string()).await;
        
        // Implement specific recovery strategies based on error type
        match error {
            BotError::RateLimited(_) => {
                tracing::info!("Rate limited - enabling backoff mode");
                let mut flags = self.feature_flags.lock().await;
                flags.disable_feature("external_api_calls");
                
                // Re-enable after delay
                let flags_clone = Arc::clone(&self.feature_flags);
                tokio::spawn(async move {
                    tokio::time::sleep(Duration::from_secs(300)).await; // 5 minutes
                    let mut flags = flags_clone.lock().await;
                    flags.enable_feature("external_api_calls");
                });
            }
            BotError::Network(_) => {
                tracing::info!("Network error - entering degraded mode");
                let mut flags = self.feature_flags.lock().await;
                flags.disable_feature("file_uploads");
                flags.disable_feature("rich_embeds");
            }
            BotError::Gateway(_) => {
                tracing::info!("Gateway error - preparing for reconnection");
                // Gateway errors are usually handled by the client automatically
            }
            _ => {
                tracing::warn!("Unhandled error type, using default recovery");
            }
        }
    }
}

impl AdvancedRecoveryBot {
    async fn process_message_with_recovery(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        if let Some(content) = &message.content {
            match content.trim() {
                "!health" => self.send_comprehensive_health_report(ctx, message).await?,
                "!recover" => self.manual_recovery(ctx, message).await?,
                "!features" => self.show_feature_status(ctx, message).await?,
                _ => {
                    // Process other commands with fallback
                    self.send_message_with_all_fallbacks(ctx, message, "Command processed with recovery protection").await?;
                }
            }
        }
        Ok(())
    }

    async fn send_comprehensive_health_report(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        let flags = self.feature_flags.lock().await;
        let features_status = format!(
            "**Features:**\n\
             • Rich Embeds: {}\n\
             • File Uploads: {}\n\
             • Interactive Buttons: {}\n\
             • External API Calls: {}",
            if flags.is_enabled("rich_embeds") { "✅" } else { "❌" },
            if flags.is_enabled("file_uploads") { "✅" } else { "❌" },
            if flags.is_enabled("interactive_buttons") { "✅" } else { "❌" },
            if flags.is_enabled("external_api_calls") { "✅" } else { "❌" }
        );
        drop(flags);

        // Get basic health metrics
        let success_rate = self.health_metrics.success_rate() * 100.0;
        let total = self.health_metrics.total_messages.load(Ordering::Relaxed);
        let health_emoji = if self.health_metrics.is_healthy() { "🟢" } else { "🔴" };

        let comprehensive_report = format!(
            "{} **Comprehensive Health Report**\n\n\
             **Overall Status:** {}\n\
             **Success Rate:** {:.1}%\n\
             **Total Messages:** {}\n\n\
             {}",
            health_emoji,
            if self.health_metrics.is_healthy() { "Healthy" } else { "Needs Attention" },
            success_rate,
            total,
            features_status
        );

        self.send_message_with_all_fallbacks(ctx, message, &comprehensive_report).await
    }

    async fn manual_recovery(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        tracing::info!("Manual recovery initiated by user");
        
        // Reset all features
        let mut flags = self.feature_flags.lock().await;
        flags.enable_feature("rich_embeds");
        flags.enable_feature("file_uploads");
        flags.enable_feature("interactive_buttons");
        flags.enable_feature("external_api_calls");
        drop(flags);

        message.reply(&ctx.api, &ctx.token, "🔄 Manual recovery completed! All features restored.").await
    }

    async fn show_feature_status(&self, ctx: &Context, message: &Message) -> Result<(), BotError> {
        let flags = self.feature_flags.lock().await;
        let status = format!(
            "🎛️ **Feature Status**\n\n\
             Rich Embeds: {}\n\
             File Uploads: {}\n\
             Interactive Buttons: {}\n\
             External API Calls: {}",
            if flags.is_enabled("rich_embeds") { "✅ Enabled" } else { "❌ Disabled" },
            if flags.is_enabled("file_uploads") { "✅ Enabled" } else { "❌ Disabled" },
            if flags.is_enabled("interactive_buttons") { "✅ Enabled" } else { "❌ Disabled" },
            if flags.is_enabled("external_api_calls") { "✅ Enabled" } else { "❌ Disabled" }
        );
        drop(flags);

        message.reply(&ctx.api, &ctx.token, &status).await
    }

    async fn send_message_with_all_fallbacks(&self, ctx: &Context, message: &Message, content: &str) -> Result<(), BotError> {
        // Try with circuit breaker protection
        let result = self.circuit_breaker.call(async {
            message.reply(&ctx.api, &ctx.token, content).await
        }).await;

        if result.is_ok() {
            return result;
        }

        // Ultimate fallback - simple reply without any enhancements
        tracing::warn!("All enhanced messaging failed, using basic fallback");
        tokio::time::sleep(Duration::from_millis(100)).await; // Brief delay
        message.reply(&ctx.api, &ctx.token, "⚠️ Message sent in recovery mode").await
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize logging
    tracing_subscriber::fmt()
        .with_env_filter("botrs=debug,error_recovery=info")
        .init();

    tracing::info!("Starting error recovery bot...");

    // Get credentials
    let app_id = std::env::var("QQ_BOT_APP_ID")
        .expect("QQ_BOT_APP_ID environment variable required");
    let secret = std::env::var("QQ_BOT_SECRET")
        .expect("QQ_BOT_SECRET environment variable required");

    // Create token
    let token = Token::new(app_id, secret);
    token.validate()?;

    // Set up intents
    let intents = Intents::default()
        .with_public_guild_messages()
        .with_guilds();

    // Create resilient bot
    let handler = AdvancedRecoveryBot::new();
    let mut client = Client::new(token, intents, handler, false)?;

    tracing::info!("Error recovery bot starting...");
    client.start().await?;

    Ok(())
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224

Usage Examples

Testing Recovery Features

# Check bot health
!health

# View feature status
!features

# Trigger manual recovery
!recover

# Get basic status
!status

Monitoring Commands

# Health metrics and uptime
!health

# Circuit breaker status
!status

# Feature availability
!features

Best Practices

Layered Recovery: Implement multiple fallback levels
Monitoring: Continuously monitor system health
Graceful Degradation: Disable non-essential features under stress
Circuit Breakers: Prevent cascade failures
Automatic Recovery: Self-healing mechanisms
Observability: Comprehensive logging and metrics
Testing: Regularly test recovery scenarios

Recovery Strategies

Immediate: Retry with exponential backoff
Short-term: Circuit breakers and feature flags
Medium-term: Graceful degradation and load reduction
Long-term: Health monitoring and automatic recovery

Error Recovery Example ​

Overview ​

Basic Error Recovery ​

Retry with Exponential Backoff ​

Circuit Breaker Pattern ​

Graceful Degradation ​

Health Monitoring ​

Automatic Recovery Strategies ​

Complete Recovery Bot Example ​

Usage Examples ​

Testing Recovery Features ​

Monitoring Commands ​

Best Practices ​

Recovery Strategies ​

See Also ​

Error Recovery Example

Overview

Basic Error Recovery

Retry with Exponential Backoff

Circuit Breaker Pattern

Graceful Degradation

Health Monitoring

Automatic Recovery Strategies

Complete Recovery Bot Example

Usage Examples

Testing Recovery Features

Monitoring Commands

Best Practices

Recovery Strategies

See Also