Welcome! Please see the About page for a little more info on how this works.

+3 votes
in Client API by

I'm working on a retry+backoff solution for the :cognitect.anomalies/unavailable "Loading Database" exception that we get after our ions restart. If this is a common issue like the troubleshooting docs suggest, I'm wondering how others are handling this.

My current approach when getting the connection with d/connect is to try/catch when performing a simple check if the connection is active (something like (d/db-stats (d/db conn))). I've also considered doing a d/q query for some specific datom that I know is present.

I've also wondered if the ion has some other mechanism of denying incoming http or lambda requests after restart until the Datomic connection is viable.

Any thoughts or other ideas? Thanks for your input!

2 Answers

+1 vote
by

Having been nearly a day, I thought I'd provide the solution I arrived at.

(ns my.app.datomic
  (:require
   [datomic.client.api :as d]
   [datomic.ion :as ion]
   [my.app.datomic.ion :as my.ion]))

(def client
  (delay
   (->>
    (ion/get-env)
    (my.ion/client))))

(def retryable-anom?
  "See https://github.com/cognitect-labs/anomalies#the-categories"
  #{:cognitect.anomalies/busy
    :cognitect.anomalies/interrupted
    :cognitect.anomalies/unavailable})

(defn -retry-conn?
  "See https://docs.datomic.com/cloud/troubleshooting.html#loading-db"
  [ex]
  (-> (ex-data ex)
      (:cognitect.anomalies/category)
      (retryable-anom?)))

(defn -connect
  [opts connect-ex-data]
  (try
    (let [connection (d/connect @client opts)
          res        (d/db-stats (d/db connection))]
      (if res
        [connection nil]
        [nil (ex-info "Unable to get db stats." connect-ex-data)]))
    (catch Exception ex
      [nil ex])))

(def conn
  (memoize
   (fn [db-name-ident]
     (let [db-name         (db-name-ident (ion/get-env))
           opts            {:db-name db-name}
           connect-ex-data (merge opts {:db-name-ident db-name-ident})]
       (loop [tries   3
              wait-ms 50
              ex      nil]
         (if-not (zero? tries)
           (let [[connection ex] (-connect opts connect-ex-data)]
             (cond
               (nil? ex)
               connection

               (-retry-conn? ex)
               (do
                 (Thread/sleep wait-ms)
                 (recur (dec tries) (* wait-ms 10) ex))

               :else
               (throw ex)))
           (throw
            (or
             ex
             (ex-info
              "Unable to connect to db. Retry count exhausted."
              connect-ex-data)))))))))
by
You'll probably want to use delay/force rather than memoize.

Also, there is a with-retry in the ion starter that they recommend.

See https://github.com/Datomic/ion-starter/blob/master/src/datomic/ion/starter/utils.clj
by
Well that's definitely helpful! It can be hard to find this kind of stuff, thanks for posting it!
+1 vote
by

We use something very similar to the below code. Most of the code originates from aws-api. The retry code is generic and can be used for all sorts of stuff.

Where you want to retry is often use-case dependent. Something to keep in mind.

(defn with-retry-sync
  "Calls work-fn until retriable? is false or backoff returns nil. If work-fn
  throws, the exception will be passed to retriable?. If it is not retriable, the
  exception will be thrown. work-fn is a function of no arguments. retriable? is
  passed the result or exception from calling work-fn. backoff is a function of
  the number of times work-fn has been called."
  [work-fn retriable? backoff]
  (let [maybe-throw #(if (instance? Throwable %) (throw %) %)]
    (loop [retries 0]
      (let [resp (try (work-fn) (catch Throwable t t))]
        (if (retriable? resp)
          (if-let [bo (backoff retries)]
            (do
              (Thread/sleep bo)
              (recur (inc retries)))
            (maybe-throw resp))
          (maybe-throw resp))))))

(defn capped-exponential-backoff
  "Returns a function of the num-retries (so far), which returns the
  lesser of max-backoff and an exponentially increasing multiple of
  base, or nil when (>= num-retries max-retries).
  See with-retry to see how it is used.
  Alpha. Subject to change."
  ([] (capped-exponential-backoff 100 10000 8))
  ([base max-backoff max-retries]
   (fn [num-retries]
     (when (< num-retries max-retries)
       (min max-backoff
         (* base (bit-shift-left 1 num-retries)))))))

(defn capped-exponential-backoff-with-jitter
  ([] (capped-exponential-backoff-with-jitter {}))
  ([{:keys [base
            max-backoff
            max-retries
            max-jitter-ms]
     :or   {base          100
            max-backoff   20000
            max-retries   4
            max-jitter-ms 100}}]
   (let [backoff-fn (capped-exponential-backoff
                      base
                      max-backoff
                      max-retries)]
     (fn [num-retries]
       (when-let [backoff-ms (backoff-fn num-retries)]
         ;; adding "jitter" can help reduce throttling on retries:
         ;; https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
         (min (+ backoff-ms (rand-int max-jitter-ms)) max-backoff))))))

(defn default-retriable?
  "Returns true if x is an anomaly or if it is an ExceptionInfo with an anomaly
  in its ex-data."
  [x]
  (or
    (contains? #{:cognitect.anomalies/busy
                 :cognitect.anomalies/unavailable}
      (:cognitect.anomalies/category x))
    (and (instance? clojure.lang.ExceptionInfo x)
      (default-retriable? (ex-data x)))))

(defn connect
  [client arg-map]
  (with-retry-sync #(d/connect client arg-map)
    default-retriable? (capped-exponential-backoff-with-jitter)))
...