Welcome! Please see the About page for a little more info on how this works.

+1 vote
in Cloud by

I could use some help debugging recent connection issues we're having with
a Datomic cloud solo topology. We were working to add analytics support so enabled it in Cloud
Formation, added and synced a config directory and adjusted our internal AWS
subaccounts to access via IAM roles.

After this we're unable to establish a connection through the client even though
it seems like all of the parts are working and wired up. I tried some
debugging by Google
and tried the suggestions of restarting the worker and gateway nodes but
unfortunately that didn't help.

If anyone could provide pointers to what to poke or try next to re-establish
authentication. it would be much appreciated.

Now all the details, first we successfully connect to the access Gateway:
`

$ datomic client access -r us-east-2 datomic-solo
OpenSSH_7.6p1 Ubuntu-4ubuntu0.3, OpenSSL 1.0.2n  7 Dec 2017
[...]
debug1: Authentication succeeded (publickey).
Authenticated to 18.216.250.130 ([18.216.250.130]:22).
debug1: Local connections to LOCALHOST:8182 forwarded to remote address socks:0
debug1: Local forwarding listening on ::1 port 8182.
debug1: channel 0: new [port listener]
debug1: Local forwarding listening on 127.0.0.1 port 8182.
debug1: channel 1: new [port listener]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: exec
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0

`
The gateway connection seems to work:

$ curl -x socks5h://localhost:8182
http://entry.datomic-solo.us-east-2.datomic.net:8182/
{:s3-auth-path "datomic-solo-storagef7f305e7-wutene4nee-s3datomic-ci7fce3hk2gj"}

But when trying to connect in Clojure ((d/client cfg)), it times out:

Execution error (ExceptionInfo) at datomic.client.impl.cloud/get-s3-auth-path (cloud.clj:179).
Unable to connect to localhost:8182

ginkgo.metadata.core=> *e
#error {
 :cause "Unable to connect to localhost:8182"
 :data {:cognitect.anomalies/category :cognitect.anomalies/unavailable, :cognitect.anomalies/message "Total timeout 60000 ms elapsed", :config {:server-type :cloud, :region "us-east-2", :system "datomic-solo", :endpoint "http://entry.datomic-solo.us-east-2.datomic.net:8182", :proxy-port 8182, :endpoint-map {:headers {"host" "entry.datomic-solo.us-east-2.datomic.net:8182"}, :scheme "http", :server-name "entry.datomic-solo.us-east-2.datomic.net", :server-port 8182}}}
 :via
 [{:type java.lang.RuntimeException
   :message "could not start [#'ginkgo.metadata.ferment/conn] due to"
   :at [mount.core$up$fn__247 invoke "core.cljc" 94]}
  {:type clojure.lang.ExceptionInfo
   :message "Unable to connect to localhost:8182"
   :data {:cognitect.anomalies/category :cognitect.anomalies/unavailable, :cognitect.anomalies/message "Total timeout 60000 ms elapsed", :config {:server-type :cloud, :region "us-east-2", :system "datomic-solo", :endpoint "http://entry.datomic-solo.us-east-2.datomic.net:8182", :proxy-port 8182, :endpoint-map {:headers {"host" "entry.datomic-solo.us-east-2.datomic.net:8182"}, :scheme "http", :server-name "entry.datomic-solo.us-east-2.datomic.net", :server-port 8182}}}
   :at [datomic.client.impl.cloud$get_s3_auth_path invokeStatic "cloud.clj" 179]}]
 :trace
 [[datomic.client.impl.cloud$get_s3_auth_path invokeStatic "cloud.clj" 179]
  [datomic.client.impl.cloud$get_s3_auth_path invoke "cloud.clj" 170]
  [datomic.client.impl.cloud$create_client invokeStatic "cloud.clj" 211]
  [datomic.client.impl.cloud$create_client invoke "cloud.clj" 194]
```
On the access client side it seems to connect and forward okay:
```
debug1: Connection to port 8182 forwarding to socks port 0 requested.
debug1: channel 2: new [dynamic-tcpip]
debug1: channel 2: free: direct-tcpip: listening port 8182 for
entry.datomic-solo.us-east-2.datomic.net port 8182, connect from 127.0.0.1 port
59870 to 127.0.0.1 port 8182, nchannels 3

This is where I'm a bit stuck and not sure where to debug next. Does anyone have
suggestions for what to poke or try next? Thanks much.

1 Answer

+1 vote
by

Hi Brad,

I suspect you are currently being affected by the issue we have identified with the analytics gateway and the Presto server is unable to start on your gateway due to a security update on the EC2 moving to a new version of Java incompatible with the version of Presto that ships with analytics. We are working this weekend to address this issue. I have however, posted a work around here:

http://ask.datomic.com/index.php/522/cloud-analytics-presto-server-cant-start

I will notify both threads as soon as we have shipped the fix.

by
Jaret;
Thanks so much, you're exactly right. I tracked down the gateway logs and they're full of those presto errors. I appreciate the work on the fix and will see if I can get access to the gateway node to apply the workaround. I appreciate the pointer and help.
by
Jaret;
We tried this fix and while it does resolve the presto error and CloudWatch logs on the bastion look nice and clean now, it still doesn't resolve the original issue. Bummer. For a little more detail on the initial post, I tried playing around with feeding AWS credentials that shouldn't work to the `d/client` request and get the same behavior and error: a long wait followed by a timeout. So it seems like the ssh tunnel is good but for some reason the client credential negotiation isn't happening. So sorry, I am stumped about what is going wrong. Any suggestions for how to debug or poke at this more are very welcome. Thank you again.
by
Hi Brad,

We released a fix for the presto issue on the latest version 732-8992.  In regards to your intial connection issue, the REPL you are launching from needs to also have credentials sourced to be able to reach the machine.  Could you share your config map, if you're confident that you have the right credentials in REPL?  If it's easier to share more sensitive information we can move this over to a support case by e-mailing support@cognitect.com.  I can then circle back here when we have worked out a solution.
by
Jaret;
Thanks so much for the presto fix and for the help debugging authentication. I do have credentials sourced in the REPL and have tried with both correct and incorrect credentials hoping to trigger some different kind of error with the wrong ones but get the same error message above. The config map is: ```
{:server-type :cloud
 :region "us-east-2"
 :system "datomic-solo"
 :endpoint "http://entry.datomic-solo.us-east-2.datomic.net:8182"
 :proxy-port 8182}
```
Is that timeout error what I expect to get with incorrect credentials? Any other logs or areas I can poke at? Sorry to be so clueless here about how to unstick this. I really appreciate all the help.
Welcome to the Datomic Knowledgebase, where you can make features requests, ask questions and receive answers from other members of the community.
...