Breached Data Management Platform
Overview
Engaging with us enabled our client to build one of the biggest searchable cybersecurity breach entries in the world. 6 Billion entries and growing!
Data Breaches are security incidents where confidential personally identifiable information, or PII, like user credentials, contact, and financial information, get inadvertently exposed due to vulnerable and insecure software systems. An American startup, still in stealth mode, wanted to make this data publicly accessible to allow people to assess if their PII or even identity was exposed.
Share
Requirements
Our solution
To begin with, the client needed to build multiple web applications to support their target customer segments to search and manage data breaches.
Over multiple years of engagement, the Synergenie team continued to develop and maintain multiple visually appealing and performant customer-facing web applications for multiple product lines. Stripe was used for managing user billing and subscription management. Various external vendor and API integrations were also undertaken for user identity protection, user profile integration, and data compromise features. A separate administration backend application allowed authorized users to perform administrative tasks.
Efficiently storing and retrieving big data of the security breaches was a challenge.
Synergenie iterated through multiple proof of concepts to handle the huge volume of breached entries. (Over 6 billion and growing!) Functional proof of concepts applications built on Google Bigquery, Amazon Redshift were eventually discarded in favor of Apache Cassandra. Cassandra gave one of the best pay-offs in terms of price and performance.
Populating breach data efficiently into the breach database was a big challenge and time consuming.
Breach data is available in multiple formats from sql database dumps, to text and CSV files and does not have a fixed structure. Additionally each data breach could contain files from kilobytes to tens of gigabytes. After understanding and extended brainstorming and consultations, the Synergenie team helped defined a standardized process for data cleaning, extraction and validation. We also built a web application, with point and click wizards, to populate data cleaned by Data Analysts into the breach database.
Results:
Synergenie was involved with developing multiple applications from scratch in the early stages of the client’s journey, and in providing a strong foundation for the future. The systems built by us are resilient and scalable to address the data protection needs of hundreds of thousands of users that the client plans to rapidly scale up to.
Multiple internal systems, vendor systems, 3rd party APIs and background process were integrated together in the background to provide a seamless user experience.
From Ethereum ERC 20 token-based blockchain, big data, efficient API gateway management at the data level, to Bootsrap, Vue, Laravel, Lambdas, the Synergenie team often worked outside their comfort zone on new and unfamiliar technologies.
AWS lambdas, auto-scaled EC2 instances, Kong based API gateway, and multi-node Cassandra and Spark clusters allowed improved scalability, resiliency and performance.
