Deploying GeoSpock DB Discovery

Here you will find instructions and links to the resources you will need to deploy GeoSpock DB Discovery in your AWS environment.

How Discovery is deployed

GeoSpock DB Discovery is deployed as a CloudFormation stack, from a template provided by GeoSpock.

The CloudFormation template will deploy and configure resources in your AWS environment, giving you the opportunity to explore and run queries against a fixed set of open datasets and to experience first-hand the power and capabilities of GeoSpock DB.

What is deployed

The CloudFormation template deploys a "query cluster" of EC2 instances sitting behind an Application Load Balancer. The query cluster consists of a single "coordinator" machine and a customisable number of "worker" machines. The number of worker machines is controlled with an input parameter, specified when the CloudFormation stack is deployed. The more worker machines you have, the faster, in general, your queries will run. GeoSpock DB makes use of Presto technology. Presto and the GeoSpock software are installed on all the machines in the cluster.

The deployment is automatically configured to connect to the pre-ingested datasets which are hosted by GeoSpock.

Prerequisites

GeoSpock DB Discovery will only work if your AWS account has been granted access to the GeoSpock resources. Appropriate access will only have been set up if you have already registered for a GeoSpock DB Discovery trial and you have received an email to confirm your setup.

In order to complete the deployment successfully, you will need to be signed in to the same AWS account that you provided during registration; and your IAM User or assumed Role will need a number of permissions to create and/or modify resources in your account.

A minimal policy can be found below.

To check your permissions, log in to the account as an Admin / Root or contact your AWS account administrator.

Deployment

To begin the deployment of GeoSpock DB Discovery in your AWS account, follow the link below to load the CloudFormation template within the console:

Launch a GeoSpock DB Discovery Deployment

If the console reports “S3 error: Access Denied” it is likely that your GeoSpock DB Discovery trial has not been activated, or else it has expired. If you have received confirmation of your trial being activated and are still within the trial period, please contact the GeoSpock team for support.

Note that GeoSpock DB Discovery is always deployed in the same AWS region as the sample data. This is to minimise the costs of retrieving the data when queries are run. The deployment region is Singapore (ap-southeast-1).

Parameters

You will need to provide values for the following parameters. More details for these parameters can be found in the sections below.

  • Stack Name - A suitable name that describes what resources the stack creates.
  • VPCId - The name of an existing VPC to deploy the Discovery resources into.
  • SubnetIds - At least two existing subnet IDs in at least two availability zones attached to the VPC selected above.
  • PrestoWorkerEC2InstanceCount - The number of Presto worker nodes to be created in the stack.
  • SourceCIDR - The primary external IP address range that will be used to connect into the Discovery cluster.
  • SSLCertificateARN (optional) - The ARN of an existing SSL certificate to enable HTTPS connections into the Discovery cluster.
  • AdditionalSourceCIDR1 (optional) - An additional external IP address range that will be used to connect into the Discovery cluster.
  • AdditionalSourceCIDR2 (optional) - Another additional external IP address range that will be used to connect into the Discovery cluster.
Specifying a VPC ID

GeoSpock DB Discovery resources must be deployed into a VPC with public subnets. Each AWS region provides a default VPC as standard, which can be selected from the dropdown. The default VPC has the required configuration to run Discovery, so it is not essential to create a new one. You are free to create a separate VPC if you wish - if you do, you will need to ensure that there are at least two public subnets attached to an Internet Gateway.

A guide to creating VPCs can be found here: Getting started with Amazon VPC.

Selecting Subnet IDs

Please select at least two subnet IDs from the dropdown. These subnets must be attached to the VPC selected above. Unless you have created a new VPC, the subnets in the dropdown will all belong to the default VPC.

Controlling the Number of Workers

Use the PrestoWorkerEC2InstanceCount parameter to control how many worker nodes you wish to have in your GeoSpock DB Discovery cluster. The more worker machines you have in your cluster, the faster your queries will be in general. Note that a bigger cluster is recommended when working with larger datasets; for instance, some of our examples recommend a cluster size of 10 workers. On the other hand, a query cluster targeting small datasets will operate adequately with just a single worker instance.

GeoSpock DB Discovery allows you to experiment with different cluster sizes. You can change the number of workers in your cluster at any time by updating the stack.

Specifying Source CIDR Ranges

For security reasons, access to Discovery is restricted to specified external IP addresses (CIDR ranges or "blocks"). Up to three CIDR ranges can be specified as parameters during the deployment of the CloudFormation stack. Only one of these parameters (SourceCIDR) is required; the other two (AdditionalSourceCIDR1 and AdditionalSourceCIDR2) are optional.

To determine your own external IP address, you can use the website https://checkip.amazonaws.com/. When specifying the Source CIDR parameters in the CloudFormation console, enter the IPv4 address and add /32 at the end, e.g. 81.122.131.205/32.

If you wish to connect a number of machines on a network, your network administrator should be able to provide the CIDR range for those machines.

SSL Certificates for HTTPS (optional)

It is recommended that you secure your connection to the Discovery deployment using SSL. In order to do this, you will need to request or import a certificate using the AWS Certificate Manager (ACM) service from the AWS console. Please note that the certificate must be in the same region as the Discovery resources are being deployed into.

Note that there are further steps to take, once the stack has been created, to complete the process of enabling SSL. See Adding a CNAME record below.

Deploying your CloudFormation stack

Once all the required parameters have been filled in, you will need to read and accept the statement "I acknowledge that AWS CloudFormation might create IAM resources" and then click “Create Stack”. The deployment process will take around 5 minutes, after which refreshing the console will show that the stack status has changed to CREATE_COMPLETE. Note that it may take a further few minutes before the deployment is ready to run queries.

Outputs

Once the stack has been created, you can click on the Outputs tab in the CloudFormation console to find the hostname value (labelled GeoSpockDBDiscoveryHostName) for the new deployment, which will be similar to disco-Prest-26Z2LD7S11IPM-2177915702.ap-southeast-1.elb.amazonaws.com.

Make a note of this hostname - if connecting using HTTP (without SSL) the hostname will form part of the "connection string" you will need to specify when connecting third-party tools and SQL client applications to the GeoSpock DB Discovery database.

Otherwise, you will need the hostname in the next step to complete the setup of SSL.

Adding a CNAME record (optional)

If you have opted to use HTTPS for a secure connection, and you specified an SSL certificate when deploying the stack above, you will now need to create a CNAME record in your DNS provider (e.g. AWS Route 53), using the same domain / subdomain as used by the certificate. This CNAME record must point to the hostname for the new deployment, labelled GeoSpockDBDiscoveryHostName in the CloudFormation output.

A guide to creating Route 53 records can be found here: Creating records using the Amazon Route 53 console

Example

Suppose the SSL certificate you have is for the gsdiscovery.mycompany.com subdomain. You should now create a DNS CNAME for gsdiscovery.mycompany.com pointing to the hostname found in the CloudFormation stack output. Now, when connecting to the GeoSpock database using third-party tools, you should use gsdiscovery.mycompany.com in the "connection string".

Setup complete

You are now ready to start running queries using your deployment. A good place to start is First Steps with Discovery

Redeploying with different parameters

You can redeploy your Discovery stack with a different configuration at any time. In particular, you may wish to try using a different number of worker machines in your query cluster, to see the effect on query performance.

To change the number of worker nodes, simply open the CloudFormation service, and select the deployment region. Then select the Discovery stack, choose "Update", and change the number of worker nodes.

Removing all GeoSpock DB Discovery resources

You can destroy and redeploy your Discovery stack as many times as you like during the trial period. You can easily remove all resources that were created using CloudFormation from the CloudFormation console. Select the Discovery stack, choose "Delete", and then "Delete stack".

If you created a DNS CNAME record and ACM Certificate, these will need to be removed manually. Do ensure that the ACM certificate is not being used by any other resources before deleting it.

What happens at the end of the trial?

At the end of the trial period, you will lose access to the shared GeoSpock DB Discovery resources. This means that it will no longer be possible to deploy a new Discovery stack or redeploy an existing one with a new configuration. Any live deployment will continue to run, but SQL queries will fail with an "Access Denied" error, as they will be unable to connect to the shared datasets.

Please note that your CloudFormation stack will not automatically shutdown. You will continue to be responsible for running costs until you shut it down yourself from the CloudFormation service - see Removing all GeoSpock DB Discovery resources above.

Getting help

If you have any difficulty deploying the GeoSpock DB Discovery stack, and you cannot find the solution within the documentation, then please email discovery.support@geospock.com for assistance. Please provide as much information about the problem, with details of any errors, as this will help us to diagnose any issues more quickly.

Minimal IAM policy for deploying Discovery

The following is a minimal IAM policy for deploying GeoSpock DB Discovery:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "acm:*",
                "autoscaling:CreateAutoScalingGroup",
                "autoscaling:CreateLaunchConfiguration",
                "autoscaling:DeleteAutoScalingGroup",
                "autoscaling:DeleteLaunchConfiguration",
                "autoscaling:Describe*",
                "autoscaling:UpdateAutoScalingGroup",
                "cloudformation:ContinueUpdateRollback",
                "cloudformation:CreateChangeSet",
                "cloudformation:CreateStack",
                "cloudformation:DeleteStack",
                "cloudformation:Describe*",
                "cloudformation:GetStackPolicy",
                "cloudformation:GetTemplate",
                "cloudformation:GetTemplateSummary",
                "cloudformation:ListStackResources",
                "cloudformation:ListStacks",
                "cloudformation:UpdateStack",
                "cloudwatch:DeleteAlarms",
                "cloudwatch:DescribeAlarms",
                "cloudwatch:PutMetricAlarm",
                "ec2:AllocateAddress",
                "ec2:AssociateAddress",
                "ec2:AuthorizeSecurityGroupIngress",
                "ec2:CreateNetworkInterface",
                "ec2:CreateSecurityGroup",
                "ec2:CreateTags",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteSecurityGroup",
                "ec2:Describe*",
                "ec2:DisassociateAddress",
                "ec2:ModifyInstanceAttribute",
                "ec2:RevokeSecurityGroupIngress",
                "ec2:ReleaseAddress",
                "ec2:RunInstances",
                "ec2:StartInstances",
                "ec2:StopInstances",
                "ec2:TerminateInstances",
                "elasticloadbalancing:AddTags",
                "elasticloadbalancing:CreateListener",
                "elasticloadbalancing:CreateLoadBalancer",
                "elasticloadbalancing:CreateTargetGroup",
                "elasticloadbalancing:DeleteListener",
                "elasticloadbalancing:DeleteLoadBalancer",
                "elasticloadbalancing:DeleteTargetGroup",
                "elasticloadbalancing:Describe*",
                "elasticloadbalancing:ModifyLoadBalancerAttributes",
                "elasticloadbalancing:RegisterTargets",
                "health:DescribeEventAggregates",
                "iam:AddRoleToInstanceProfile",
                "iam:CreateInstanceProfile",
                "iam:CreateRole",
                "iam:CreateServiceLinkedRole",
                "iam:DeleteInstanceProfile",
                "iam:DeleteRole",
                "iam:DeleteRolePolicy",
                "iam:GetInstanceProfile",
                "iam:GetRolePolicy",
                "iam:ListRoles",
                "iam:PassRole",
                "iam:PutRolePolicy",
                "iam:RemoveRoleFromInstanceProfile",
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:DeleteLogGroup",
                "logs:DescribeLogGroups",
                "logs:DescribeLogStreams",
                "logs:PutLogEvents",
                "logs:PutRetentionPolicy",
                "sns:ListTopics",
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": "*"
        }
    ]
}

Note that you only need permissions for AWS Certificate Manager (the line "acm:*" in the policy above) if you intend to enable HTTPS.