Laravel: Anonymize database on local environments for GDPR

If you often need to work with a copy of a production database on your local environment, you might have to anonymize the contents of all personal data due to restrictions of the GDPR. I often ship my Laravel projects with an artisan-command called “CleanupPersonalData”, in which I add some logic to purge all privacy-sensitive contents from the database.

Looking for a similar solution? Notice that this always requires manual implementation, since you need to review which data/columns in the database are personal, and which not.

Let take for example a simple ‘User’ model. After downloading a copy of your production database, you have a copy of all users, probably including columns as first_name, last_name, email and password.

namespace App\Console\Commands;

use App\Models\User;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\DB;

class CleanupPersonalData extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'cleanup:gdpr {aggressive?} {force?}';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Cleanup Personal data for local environments';

    /**
     * Create a new command instance.
     *
     * @return void
     */
    public function __construct()
    {
        parent::__construct();
    }

    /**
     * Execute the console command.
     *
     * @return mixed
     */
    public function handle()
    {
        if ('production' == config('app.env')) {
            abort(403, 'This command should only be run in local/testing environments!');
            exit;
        }

        // do we want to force overwrite contents, or check if this was anonymized before?
        $force = false;
        if ('true' == $this->argument('force')) {
            $force = true;
        }

        // prepend this string to anonymized data (also to check if anonymized before)
        $anonymizeString = 'anonymized: ';

        // using Faker class to generate new dummy content
        $faker = \Faker\Factory::create('nl_NL');

        // START MODEL 1 (USER)

        // specify a main table-column you will be checking for if the data was already anonymized
        $mainColumn = 'lastname';

        // make sure we fetch trashed items as well (if soft-deletes is enabled)
        $items = User::withTrashed();

        // if not forced, check if anonymized before
        if (!$force) {
            $items
                ->where(function ($query) use ($mainColumn, $anonymizeString) {
                    $query->where($mainColumn, 'NOT LIKE', DB::raw($anonymizeString) . '%');
                    $query->orWhereNull($mainColumn);
                });
        }

        // get all items as a Collection
        $items = $items->get();
        $this->output->writeln(sprintf('%s users', $items->count()));

        // make it easy for us to login to test-users (do this once, since hashing consumes a lot of memory / cpu)
        $pass = bcrypt('123456');
        foreach ($items as $item) {
            if ($force || strpos($item->$mainColumn, $anonymizeString) === false) {
                $item->$mainColumn = $anonymizeString.$faker->lastName;
                $item->firstname = $faker->firstName;
                $item->email = $faker->safeEmail; // to make sure we won't send any emails to real addresses
                $item->password = $pass;
                $item->save();
            }
        }

        // END MODEL 1 (USER)

    }
}

 

Repeat the above for multiple Models and you no longer have to worry about data-leaks. Make a copy of your production-database, import it into the local environments, run the command

php artisan cleanup:gdpr

and make sure to delete the original backup-file.

Comments are disabled