Channels
  • r

    Rod Christiansen

    6 days ago
    Been following the read me but can’t quite get there
  • zwass

    zwass

    6 days ago
    Where are you stuck? Any error messages?
  • r

    Rod Christiansen

    6 days ago
    Hey Zach, thanks for jumping in
    Let me get you a log
    So when I run
    terraform show
    I get:
    If I re-run
    terraform apply -var-file=prod.tfvars
    I get a lot of
    │ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.

    It’s trying to re-create some resources, is there a way to skip it?
    Still happens after running a
    terraform destroy
    as well
    Essentially, I’m not sure if I have everything up
  • zwass

    zwass

    6 days ago
    Hmm, sounds like you first ran it without the vars file, then tried again with the vars file?
  • r

    Rod Christiansen

    6 days ago
    Mmmm that’s a possibility? I might’ve in one of a few attempts
  • zwass

    zwass

    6 days ago
    If you do the
    terraform destroy
    , does it seem to destroy everything?
  • r

    Rod Christiansen

    6 days ago
    So it created resources not attached to my
    vars
    and now it can’t interact with them?
    Sorry kind of new to Terraform
  • zwass

    zwass

    6 days ago
    Yeah no prob. I'm not our best Terraform expert, but I have some experience and I'll try to help.
    Mind sharing your vars file with me via DM?
  • r

    Rod Christiansen

    6 days ago
    Sure
  • zwass

    zwass

    6 days ago
    Let's see if we can get back to a clean slate... What's the output of your
    terraform destroy
    ?
  • r

    Rod Christiansen

    6 days ago
    Let me run that
    Destroy complete! Resources: 53 destroyed.
  • zwass

    zwass

    6 days ago
    Okay let's retry
    terraform apply -var-file=prod.tfvars
  • r

    Rod Christiansen

    6 days ago
    Okay 👍
  • zwass

    zwass

    6 days ago
    I think you'll be okay with those warnings, but I'm going to file an issue so that we can quiet them on our end.
  • r

    Rod Christiansen

    6 days ago
    K thanks, its running
    usually takes 15 min
    Yea same results
    I think you might be right here, where the conflict comes from: https://osquery.slack.com/archives/C01DXJL16D8/p1655483457765809?thread_ts=1655481863.996759&cid=C01DXJL16D8
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creating...
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [10s elapsed]
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [20s elapsed]
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [30s elapsed]
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [40s elapsed]
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Still creating... [50s elapsed]
    aws_route53_record.dogfood_fleetdm_com_validation["<http://fleet.ecuad.ca|fleet.ecuad.ca>"]: Creation complete after 52s [id=Z0153789AGKAV73DDKKN__3b82f3c76c9877eb0905c1f97d84050c.fleet.ecuad.ca._CNAME]
    aws_acm_certificate_validation.dogfood_fleetdm_com: Creating...
    aws_acm_certificate_validation.dogfood_fleetdm_com: Creation complete after 0s [id=2022-06-17 16:48:19.658 +0000 UTC]
    ╷
    │ Warning: Argument is deprecated
    │ 
    │   with aws_s3_bucket.osquery-results,
    │   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
    │    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
    │ 
    │ Use the aws_s3_bucket_lifecycle_configuration resource instead
    │ 
    │ (and 8 more similar warnings elsewhere)
    ╵
    ╷
    │ Error: failed creating IAM Role (fleetdm-role): EntityAlreadyExists: Role with name fleetdm-role already exists.
    │ 	status code: 409, request id: d2dcb2ad-bfa4-49e9-8362-6a5fb48d5fdd
    │ 
    │   with aws_iam_role.main,
    │   on <http://ecs-iam.tf|ecs-iam.tf> line 74, in resource "aws_iam_role" "main":
    │   74: resource "aws_iam_role" "main" {
    │ 
    ╵
    ╷
    │ Error: error creating application Load Balancer: DuplicateLoadBalancerName: A load balancer with the same name 'fleetdm' exists, but with different settings
    │ 	status code: 400, request id: e49a8a95-68ef-4ec6-9b04-5860b251dab2
    │ 
    │   with aws_alb.main,
    │   on <http://ecs.tf|ecs.tf> line 14, in resource "aws_alb" "main":
    │   14: resource "aws_alb" "main" {
    │ 
    ╵
    ╷
    │ Error: Creating CloudWatch Log Group failed: ResourceAlreadyExistsException: The specified log group already exists:  The CloudWatch Log Group 'fleetdm' already exists.
    │ 
    │   with aws_cloudwatch_log_group.backend,
    │   on <http://ecs.tf|ecs.tf> line 114, in resource "aws_cloudwatch_log_group" "backend":
    │  114: resource "aws_cloudwatch_log_group" "backend" { #tfsec:ignore:aws-cloudwatch-log-group-customer-key:exp:2022-07-01
    │ 
    ╵
    ╷
    │ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-results-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
    │ 	status code: 409, request id: 8QKQAWKMKFPG8V3T, host id: cGLTOq4ot2jEU2WRo6W7KOFHMqUUGEDol93rope13+e2btUrMvzII5SEHItCYT+99PGKR53PQcU=
    │ 
    │   with aws_s3_bucket.osquery-results,
    │   on <http://firehose.tf|firehose.tf> line 7, in resource "aws_s3_bucket" "osquery-results":
    │    7: resource "aws_s3_bucket" "osquery-results" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01  #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
    │ 
    ╵
    ╷
    │ Error: error creating S3 Bucket (ca-ecuad-queryops-fleet-osquery-status-archive-dev): BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
    │ 	status code: 409, request id: 8QKS5J9XHNHQ6939, host id: SRTEVZrLHrsc9qt4Sii5tNnZtahUO60eEhaDZdW2KcaYjFKRqMWOdLX7trgWHh4kv8Guk7hmkpY=
    │ 
    │   with aws_s3_bucket.osquery-status,
    │   on <http://firehose.tf|firehose.tf> line 41, in resource "aws_s3_bucket" "osquery-status":
    │   41: resource "aws_s3_bucket" "osquery-status" { #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
    │ 
    ╵
    ╷
    │ Error: error creating Secrets Manager Secret: ResourceExistsException: The operation failed because the secret /fleet/database/password/master already exists.
    │ 
    │   with aws_secretsmanager_secret.database_password_secret,
    │   on <http://rds.tf|rds.tf> line 7, in resource "aws_secretsmanager_secret" "database_password_secret":
    │    7: resource "aws_secretsmanager_secret" "database_password_secret" { #tfsec:ignore:aws-ssm-secret-use-customer-key:exp:2022-07-01
    │ 
    ╵
    ╷
    │ Error: Error creating DB Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-db-mysql-parameter-group already exists
    │ 	status code: 400, request id: 58d16106-fe62-4906-b133-cccfabdb4d42
    │ 
    │   with aws_db_parameter_group.example_mysql,
    │   on <http://rds.tf|rds.tf> line 107, in resource "aws_db_parameter_group" "example_mysql":
    │  107: resource "aws_db_parameter_group" "example_mysql" {
    │ 
    ╵
    ╷
    │ Error: Error creating DB Cluster Parameter Group: DBParameterGroupAlreadyExists: Parameter group fleetdm-aurora-mysql-cluster-parameter-group already exists
    │ 	status code: 400, request id: 96730c78-873b-4e42-bfca-3a34895dbcdd
    │ 
    │   with aws_rds_cluster_parameter_group.example_mysql,
    │   on <http://rds.tf|rds.tf> line 113, in resource "aws_rds_cluster_parameter_group" "example_mysql":
    │  113: resource "aws_rds_cluster_parameter_group" "example_mysql" {
    │ 
    ╵
    ╷
    │ Error: error creating S3 Bucket (osquery-carve-default): BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
    │ 	status code: 409, request id: BYZ1KZ1KNRCAJ0N6, host id: 5gY0n/SnlpvTsDu0zsCishhV3c4GUrirj3knAb7cTmc8MNCEcj50oKHkiIqkNAhv7bC+rcKeFdE=
    │ 
    │   with aws_s3_bucket.osquery-carve,
    │   on <http://s3.tf|s3.tf> line 9, in resource "aws_s3_bucket" "osquery-carve":
    │    9: resource "aws_s3_bucket" "osquery-carve" { #tfsec:ignore:aws-s3-enable-versioning #tfsec:ignore:aws-s3-encryption-customer-key:exp:2022-07-01 #tfsec:ignore:aws-s3-enable-bucket-logging:exp:2022-06-15
    │ 
    ╵
    ╷
    │ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleetdm-mysql-iam' already exists.
    │ 	status code: 400, request id: 4ec6cead-a4a7-45a2-a46b-50af416b34fb
    │ 
    │   with module.aurora_mysql.aws_db_subnet_group.this[0],
    │   on .terraform/modules/aurora_mysql/main.tf line 38, in resource "aws_db_subnet_group" "this":
    │   38: resource "aws_db_subnet_group" "this" {
    │ 
    ╵
    ╷
    │ Error: Error creating DB Subnet Group: DBSubnetGroupAlreadyExists: The DB subnet group 'fleet-vpc' already exists.
    │ 	status code: 400, request id: 6785406d-6e31-4016-b3ba-61df708bb2ec
    │ 
    │   with module.vpc.aws_db_subnet_group.database[0],
    │   on .terraform/modules/vpc/main.tf line 458, in resource "aws_db_subnet_group" "database":
    │  458: resource "aws_db_subnet_group" "database" {
    │ 
    ╵
    ╷
    │ Error: creating ElastiCache Subnet Group (fleet-vpc): CacheSubnetGroupAlreadyExists: Cache subnet group fleet-vpc already exists.
    │ 	status code: 400, request id: 1d124b61-29b4-46e1-a372-fc2e5fcb4b77
    │ 
    │   with module.vpc.aws_elasticache_subnet_group.elasticache[0],
    │   on .terraform/modules/vpc/main.tf line 542, in resource "aws_elasticache_subnet_group" "elasticache":
    │  542: resource "aws_elasticache_subnet_group" "elasticache" {
    │ 
    ╵
    ╷
    │ Error: Error creating EIP: AddressLimitExceeded: The maximum number of addresses has been reached.
    │ 	status code: 400, request id: a840976c-a6a1-45fe-8f1e-c155a458d6b1
    │ 
    │   with module.vpc.aws_eip.nat[0],
    │   on .terraform/modules/vpc/main.tf line 1001, in resource "aws_eip" "nat":
    │ 1001: resource "aws_eip" "nat" {
    │ 
    ╵
    Releasing state lock. This may take a few moments...
    rod@RodChristiansen aws %

    I prob need to manually delete all these resources that already exist
    Wonder if just changing some variables would get it to run a cleaner run
    New prefix
  • zwass

    zwass

    6 days ago
    Sounds like @Benjamin Edwards (our expert who wrote the configs) will be around soon. Let's wait for him to come and advise 🙂
  • r

    Rod Christiansen

    6 days ago
    Oh amazing!
  • Benjamin Edwards

    Benjamin Edwards

    6 days ago
    Hey Rod! Sorry you are having issues. I'm just waiting for my wife to get home and take over kiddo duty. Happy to help after that. I'll try to catch up in the meantime.
  • r

    Rod Christiansen

    6 days ago
    All good. Thanks for both of you
  • Benjamin Edwards

    Benjamin Edwards

    6 days ago
    @Rod Christiansen I think Zach had the right idea, seems like terraform might have run with the default values, which will work for some resources, but will conflict with others (like S3, since bucket names are globally unique across all AWS accounts). I think another thing to consider, especially if you are new to Terraform, is to omit the remote state steps in the guide. Remote state is great for bigger projects or projects where multiple users (or CI/CD systems) are attempting to alter infrastructure at the same time. To start with it might be easier to just get rid of this, something I should have considered in the guide, but unfortunately didn't. So first in
    <http://main.tf|main.tf>
    edit the terraform block to look like:
    terraform {
    #  backend "s3" {
    #    bucket         = "fleet-terraform-remote-state"
    #    region         = "us-east-2"
    #    key            = "fleet"
    #    dynamodb_table = "fleet-terraform-state-lock"
    #  }
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "3.63.0"
        }
    
        tls = {
          source = "hashicorp/tls"
          version = "3.3.0"
        }
      }
    }
    Then make sure you follow this step of the guide:
    We’ll also need a
    tfvars
    file to make some environment-specific variable overrides. Create a file in the same directory named
    prod.tfvars
    and paste the contents (note the bucket names will have to be unique for your environment):
    fleet_backend_cpu         = 1024
    fleet_backend_mem         = 4096 //software inventory requires 4GB
    redis_instance            = "cache.t3.micro"
    domain_fleetdm            = "<http://fleet.queryops.com|fleet.queryops.com>" // YOUR DOMAIN HERE
    osquery_results_s3_bucket = "foo-results-bucket" // UNIQUE BUCKET NAME
    osquery_status_s3_bucket  = "bar-status-bucket" // UNIQUE BUCKET NAME
    file_carve_bucket         = "qux-file-carve-bucket" // UNIQUE BUCKET NAME
    If you run into trouble, maybe we can screen share on zoom and get it sorted out?
  • r

    Rod Christiansen

    6 days ago
    Hi @Benjamin Edwards thanks so much, this helps. Thanks for offering to jump on a screen share. I’m going to go through my AWS and see if I can find and delete all the resources listed with errors
    I’ll pipe back in here with results
    One quick other question you might know @Benjamin Edwards — the NS records get re-created on every run? I’ll end up getting new NS records from AWS hosted zone hey? I got mine set by our network person, who is great, but slow to respond (ha) and I try to avoid asking him for help — I prob won’t be able to use existing ones set in our domain side?
    Ah sorry these aren’t really Fleet questions but more AWS Qs 😬
  • Benjamin Edwards

    Benjamin Edwards

    6 days ago
    if you destroy them, they'll be recreated. Otherwise TF knows they already exist
  • r

    Rod Christiansen

    3 days ago
    Got it now to the point that it doesn’t complain about any existing resources, but still getting a single error at the end of the build, looks like the tcp call is getting refused and its hanging up the phone
    ╷
    │ Error: error creating ElastiCache Replication Group (fleetdm-redis): waiting for completion: RequestError: send request failed
    │ caused by: Post "<https://elasticache.ca-central-1.amazonaws.com/>": read tcp 192.168.1.137:55529->52.94.100.101:443: read: connection reset by peer
    │ 
    │   with aws_elasticache_replication_group.default,
    │   on <http://redis.tf|redis.tf> line 13, in resource "aws_elasticache_replication_group" "default":
    │   13: resource "aws_elasticache_replication_group" "default" {
    │ 
    ╵

    Hey! I got a fully clean run! \o/
    Sorry me calling again ☎️
  • zwass

    zwass

    2 days ago
    Ah, yes, this is an annoying thing with Docker's rate limiting. Now that the migration completed, do the other tasks successfully start up?
  • r

    Rod Christiansen

    2 days ago
    Doesn't look like it
    I was just surprised to see docker in the mix
    I need to get an account?
  • zwass

    zwass

    2 days ago
    No you don't need an account -- Docker is just where the container images are hosted.
    Can you set the target tasks on the service down to 0? Then try running the migrate again?
  • r

    Rod Christiansen

    2 days ago
    Will do. Thanks.
  • zwass

    zwass

    2 days ago
    You might still get rate limited for a while -- sorry about that, Docker Hub can really be a pain.
  • r

    Rod Christiansen

    2 days ago
    Got it. As long it's not a config error on my end. Just try until it works (not too often 😅)
  • Benjamin Edwards

    Benjamin Edwards

    2 days ago
    Zach is right. Migration task needs to execute. Scaling down the fleet service to zero for 15-30 minutes should clear your rate limit. While the service is scaled down attempt the migration task, which is the ecs run-task command. Once that runs successfully, you can scale the service back up.
  • zwass

    zwass

    2 days ago
    Woooo hooo!
  • r

    Rod Christiansen

    2 days ago
    Very exciting. Just gotta get my network guy to set those NameServers now ha
  • Benjamin Edwards

    Benjamin Edwards

    2 days ago
    Rod feel free to hit me up with feedback about the process. Happy to make updates to the blog or modify readme. Your perspective, someone relatively new to TF, was something I was clearly missing when I drafted the first version.
  • r

    Rod Christiansen

    1 day ago
    Couple things that came to mind that you could add to blog post: • warning about not running multiple times like I did, I setup from scratch a few times ‘to start fresh’ but hadn’t run a
    terraform apply -destroy
    • warning about unique naming for the buckets/dynamo since they are global • warning on the migration step about docker limit like I hit and how to retry • the
    hashicorp/aws
    version needs to match •