immix-solutions Support Admin – immix-solutions https://www.immix-solutions.com IT Services & Solutions Mon, 06 Jan 2020 22:05:40 +0000 en-US hourly 1 https://wordpress.org/?v=5.3.2 Right Partner for Blockchain Programs https://www.immix-solutions.com/?p=1644&utm_source=rss&utm_medium=rss&utm_campaign=right-partner-for-blockchain-programs https://www.immix-solutions.com/?p=1644#respond Mon, 06 Jan 2020 22:05:40 +0000 https://www.immix-solutions.com/?p=1644 Blockchain technology holds the promise of revolutionizing many of the transaction based processes that underwrite much of the global economy. Many fear that the cost of blockchain implementation will be too high and will lead to excessive teething problems should they do choose to integrate a blockchain solution into their existing infrastructure. While these concerns are indeed understandable, blockchain implementation for businesses of all sizes doesn’t have to be either expensive or traumatic.

Let’s solve this hysteria with a structured approach based on the following steps:

Step 1 – Look for a partner who has done this before..

We have vast experience since past 5 years delivering solutions in the FinTech and Media sectors. We also have conducted various advisory and public events to share our knowledge across a variety of industry strategists. Here is table of few programs with a varied maturity, production ready, and initial PoC use cases.

Step 2 – Evaluate the maturity and skills for the implementation partner

Leverage industry proven skill sets with the right technology depth and knowledge. Here is a mix of current marketplace solutions available for Blockchain implementations

Our Center-of-Excellence focuses on the key pillars for seamless delivery. Mapping to the key skills required for successful implementation.

Step 3 – Partner’s approach

We would leverage the Rapid Prototyping Lab (RPL).

Our approach includes

Explore Blockchain opportunities

  • Discovery sessions with Blockchain start-ups and successful users
  • Further exploration sessions in local labs to make selection of ideas and scenarios to test
  • Leverage  partnership on Digital for research

Test and validate scenario’s

  • Leverage Platform Framework to validate Blockchain platforms
  • Build and test Blockchain scenarios, leveraging PoC’s scenarios
  • Set-up a joint Blockchain team to validate new scenario’s (leveraging offshoring)

Create and share knowledge

  • Leverage Point of views to create specific content
  • Set-up webinars on Blockchain to further inform your organization
  • Facilitate specific collaboration with Vendors on Blockchain development (e.g. Chain, Symbiont)

Scale expertise

  • Build specific training for your employees to become new to be Blockchain experts
  • Leverage our offshore Blockchain practice to scale quickly when required

 

Step 4 – Program Governance Evaluation

Our goal is to setup a concrete vision and implementation strategy. Combined this with a strategic testing model (proof-of-concepts) with a structured cadence. We also respect the transition and knowledge sharing plan.

 

]]>
https://www.immix-solutions.com/?feed=rss2&p=1644 0
SEC Commissioner, Blockchain, Cryto$ https://www.immix-solutions.com/?p=1654&utm_source=rss&utm_medium=rss&utm_campaign=sec-commissioner-blockchain-cryto https://www.immix-solutions.com/?p=1654#respond Mon, 06 Jan 2020 20:50:49 +0000 https://www.immix-solutions.com/?p=1654 Great meeting FinTechs who are investing over 10% of resources on compliance. Hearing the SEC commissioner aka cryptomom and fellow crypto compliance strategists on institutional adoption and gaining regulatory approvals. Interesting thoughts on how Exchange Bench marking of crypto can evolve traditional trading algorithms to expose manipulations driven by data and analytics. Checkout our insights on Crypto Asset Rating framework

 

 

 

]]>
https://www.immix-solutions.com/?feed=rss2&p=1654 0
Cloud Native Data Management https://www.immix-solutions.com/?p=1631&utm_source=rss&utm_medium=rss&utm_campaign=cloud-native-data-management https://www.immix-solutions.com/?p=1631#respond Sun, 05 Jan 2020 21:00:10 +0000 https://www.immix-solutions.com/?p=1631 You’ve likely heard about “cloud-native apps.” That term refers to software built for change, scale, resilience, and manageability. Oftentimes it’s equated with microservices and containers, but those aren’t required. Whether running in public or private clouds, a cloud-native app takes advantage of the elasticity and automation offered by the host platform.

But what are the implications for the data capabilities those apps depend on? While cloud-native apps have blueprints like the twelve factor criteria to steer design, your data services don’t.

To better understand, we would look at a domain based architecture and deployment plan. We would define Master Data Management (MDM) context into the cloud native picture.

Master data is all non-transactional data within an enterprise – for example, customers, products, materials and locations. It serves as a foundation for businesses to build analytical capabilities and more effectively manage operations. An MDM platform allows businesses to manage, govern and analyze all of their master data within a single platform. This helps them develop new insights about their business, have confidence in data quality, increase productivity and improve the customer experience.

Sample Scenario: We are trying to decouple a monolith architecture into microservices using the levers of DDD (Domain Driven Design), Access pattern for performance, and cross-domain reference of persistent data using keys. Lets look at how this spans out..

Lets define the monolith application

 

Defining the Data Domains

Product Master

This compiles, validates, enriches, and curates all your organization’s product-related data into a complete, accurate, and easily reportable golden copy. Some examples of product-related data include product types, product lines and groups, product pricing (billing) schemes, product hierarchies and historical product details.A good product master, however, will establish the right workflows and critical data elements (CDE’s) within product attributes. It will also identify processes by which to ingest and govern that product data, regardless of how that may be defined within the organization.

Customer Master

Knowing your customer helps you to push the right content through the right channel at the right time in your customer’s purchase lifecycle. A customer master solution enables you to understand and adapt to evolving customer needs and analyze and nurture customer relationships for both B2B and B2C contexts.

Vendor Master

This manages vendor data alongside product and customer data. You can maintain relationships between customer, vendor and product domains on a single platform by standardizing the vendor creation process across brands and geographies. Enrich vendor data through address verification and maintain the Tax Jurisdiction Code by integrating with third-party providers.

Lets look at the Cloud native architecture

Cloud native Transformation

Integrated Solution

Making this performant

Logical View of the final architecture

Key Characteristics

  • Simple Dataset
  • Highly Scalable
  • Optimized Performance Tuning
  • Dedicated Caching Backing Service
  • Remove inter-team Data Model Dependencies
  • Allow iterative improvement

Tool Selection Strategy

  • In-memory data grid (IMDG) based on Apache Geode – Ex. CQRS pattern
  • Scalability, availability, and performance
  • Supports many design patterns – Side cache, Inline cache, Http session management, event driven, Data aware compute (M/R)
  • Supports data replication (multi data center) – Active/Active, active-passive, DR
  • CI/CD friendly

How does Kyma solve your deployment challenges?

  • Monitoring and alerting is based on Prometheus and Grafana
  • Logging is based on Loki
  • Eventing uses Knative and NATS
  • Asset management uses Minio as a storage
  • Service Mesh is based on Istio
  • Tracing is done with Jaeger
  • Authentication is supported by dex

Food for thought – below, we look at ten characteristics of cloud-native data and why each one helps you deliver better software.

Cloud-native data is…

  1. Stored in many ways.
  2. Independent of fixed schemas.
  3. Duplicated.
  4. Integrated via service interfaces.
  5. Self-service oriented.
  6. Isolated from other tenants.
  7. At home on managed platforms.
  8. Not afraid of scale (out).
  9. Often used and discarded.
  10. Analyzed in real-time and batch.
]]>
https://www.immix-solutions.com/?feed=rss2&p=1631 0
Mainframe to Cloud Native https://www.immix-solutions.com/?p=1623&utm_source=rss&utm_medium=rss&utm_campaign=mainframe-to-cloud-native https://www.immix-solutions.com/?p=1623#respond Sun, 05 Jan 2020 18:20:20 +0000 https://www.immix-solutions.com/?p=1623 By migrating the mainframe or legacy to the cloud native, you are moving the workload to open system environments. The advantage is that there will be no change in the existing legacy program business logic. It is possible to effectively leverage critical data, enjoy a flexible, transparent and modern environment and save money on mainframe contracts.

Key drivers:

  • Old application logic – poor documentation
  • Trying to reverse engineer COBOL, PL/I, Assembler, JCL, Easytreave, IDMS, DB2, IMS..
  • Tightly coupled and brittle architecture
  • Expensive upkeep (licensing & maintenance)
  • Unable to re-host or offload in batch
  • Desire to modernize

Migration Strategies:

Rewrite – Migrate the data off the mainframe and completely rewrite the business processes in modern frameworks. Change the source language of the original application, for example, from COBOL or Natural to Java or C#.

Re-host – “Lift and Shift” the applications and data from the mainframe to Unix or Windows with interface emulation or automatic code translation tools.

ReInterface – Keep the business logic on the mainframe in its present form, but unlock them by exposing via REST API’s and Web Services.

Replace/Retire – Replace the business processes with packaged applications such as SAP, PeopleSoft and Oracle. This involves gap analysis and data conversion. Take an inventory of applications and retire those that no longer serve a purpose.

Here is a table to compare the maturity of various approaches

 

We recommend going by a phased delivery with accelerated tools and partners. This would mean co-hosting with the mainframe with re-platform tools for code migration. A gradual release to fully domain based microservices environment.

Re-platforming > by automating code conversion

Phases of delivery – Phase 1

Phases of delivery – Phases 2-4

Point-of-view on alternative approach

Alternative approaches for Re-hosting, Batch job migration, and Re-Engineering

Migrating Mainframe to Cloud

]]>
https://www.immix-solutions.com/?feed=rss2&p=1623 0
Adopting DataOps https://www.immix-solutions.com/?p=1619&utm_source=rss&utm_medium=rss&utm_campaign=adopting-dataops https://www.immix-solutions.com/?p=1619#respond Sun, 05 Jan 2020 16:57:46 +0000 https://www.immix-solutions.com/?p=1619 As an umbrella term, DataOps can refer to a range of related technologies and processes, and is as much about cultural and organizational change as it is about the adoption of emerging data management products and services. For the purposes of this paper and its associated survey, DataOps is defined as follows:

DataOps is the alignment of people, process and technology to enable more agile and automated
approaches to data management. It aims to provide easier access to data to meet the demands of
various stakeholders who are part of the data supply chain (developers, data scientists, business
analysts, DevOps professionals, etc.) in support of a broad range of use cases.

DataOps is about reducing the complexity involved in data provisioning and enabling self-service access to data in order to accelerate the development of data-driven applications and data-driven decision-making, as well as supporting business agility in response to rapidly changing business requirements.

A key element of this is reducing ‘data friction,’ which arises when the demands of data consumers (such as data analysts, developers and senior decision-makers) are not met by data operators (e.g., data management and IT professionals). This is a perennial problem that has been exacerbated in recent years by the growing volume of data, as well as the increased number of data sources and use cases that results from escalated demand from application development and analytics projects. The term ‘DataOps’ itself has not yet become mainstream. However, the associated concepts and technologies are being widely adopted by enterprises as they seek to become more data-driven.

The value of DataOps has been most visible in DevOps environments where velocity is paramount and developers have a growing role in determining data access and usage requirements, as well as influencing the choice of more agile data management products and services. However, we see that DataOps is also penetrating other business processes, driven by the similar growing influence of data scientists and data engineers, for example. As such, DataOps is also related to the need for new cultural and organizational approaches to data management associated with the shift toward being more data-driven. This includes the formation of cross-functional collaborative teams that combine data scientists, data engineers and data analysts along with data management and IT professionals.

Here is our take on adoption of DataOps

Step 1 : Get familiar with The DataOps Manifesto – Through firsthand experience working with data across organizations, tools, and industries we have uncovered a better way to develop and deliver analytics that we call DataOps.

Step 2 : Everyone on the DataOps team needs at least one of the following tools to acquire, organize, prepare, and analyze/visualize data:

  • Data access/ETL (e.g., Talend, StreamSets)
  • Data catalog (e.g., Alation, Waterline)
  • Enterprise data unification (e.g., Tamr)
  • Data preparation (e.g., Alteryx, Trifacta)
  • Data analysis and visualization (e.g., Tableau, Qlik)

Step 3 : Align to the DataOps goals:

  • Continuous model deployment
  • Promote repeatability
  • Promote productivity – focus on core competencies
  • Promote agility
  • Promote self-service

Step 4: Match DevOps and DataOps with the processes and tools

 

]]>
https://www.immix-solutions.com/?feed=rss2&p=1619 0
PostgreSQL CoE https://www.immix-solutions.com/?p=1613&utm_source=rss&utm_medium=rss&utm_campaign=postgresql-coe https://www.immix-solutions.com/?p=1613#respond Sun, 05 Jan 2020 16:29:54 +0000 https://www.immix-solutions.com/?p=1613 CoEs are often created when there is a knowledge deficit or skills gap within an organization. For example, a company may form a new center of excellence to manage the adoption and integration of robotic process automation (RPA).

An important goal of a center of excellence is to eliminate inefficiency and help move the organization to the next level of a maturity model. A CoE should include representatives from management, a line of business (LOB) and information technology (IT).

Depending upon the organization and area of interest, a CoE may be ongoing or temporary. When a CoE is ongoing, team members often have other job responsibilities; when the CoE is temporary, team members may be relieved of normal duties for the duration of the CoE.

Key Characteristics for the PostGreSQL CoE are:

Domain CoE’s

  • In Touch with market for latest trends and updates
  • Knowledge Capability Building by Knowledge Acquisition, Training’s and Partnerships
  • Work closely with Technology COE to help develop innovative solutions
  • Creating reusable scenarios on Industry Products

RPL Labs

  • RPL labs across the globe with major center in NYC
  • Develop innovative solution for client and create reusable scenarios on Industry Products and packages
  • Building Frameworks and competencies on Tools and Technologies
  • Collaborate with leading tool vendors for developing solutions

Technology CoE

  • Tool evaluation, Commoditizing Skills on new solutions & services
  • Building Frameworks and competencies on Tools and Technologies and Methodologies
  • Technology Change Management & Ensuring process innovations
  • Creating different technology solution accelerators

Figure below shows intersection of Roles and Capabilities

 

There are several quantitative and qualitative benefits that can be reaped from a CoE:

  • Establish visibility and focus: Having a dedicated team means that employees and the world are aware of the focus. CoEs also demonstrate that an organization is committed to excellence.
  • Eliminate Risks: CoEs are not impacted by the day to day business of an organization. They do not have to deal with immediate revenue pressures and are free to channel their expertise and focus on a single business activity, process, or capability.
  • Flexibility: CoEs are not limited by all the process overheads that typical product teams may have to deal with. They have the flexibility to innovate fast and fail fast.
  • Monitor Costs: CoEs make it easier to control and track costs. They help organizations measure KPIs and outcomes, especially around RoI. They also help in planning and defining the focus areas for the future.

Figure below shows the right mix of  team for the CoE

]]>
https://www.immix-solutions.com/?feed=rss2&p=1613 0
NY Blockchain Seminar https://www.immix-solutions.com/?p=1608&utm_source=rss&utm_medium=rss&utm_campaign=ny-blockchain-seminar-along-with-big-data-and-artificial-intelligence https://www.immix-solutions.com/?p=1608#respond Sun, 05 Jan 2020 16:10:48 +0000 https://www.immix-solutions.com/?p=1608

In all our consciousness, there is a lot to look forward to in the coming year(s) beyond technology. This blog is an attempt to summarize my learnings from the various seminars that I presented on intersection of blockchain, AI, and big data this year and where I believe the focus will be in upcoming years. There is a wealth of information online and in my blogs covering the basics of listed technologies hence not in scope of this blog.

I saw a maturity in my audience from beginners to seasoned investors, the goal being to get from the crypto craze hype to a discussion on core components of all these platforms. I believe this kind of understanding to be key as that is the differentiator and the intelligence one can gain in cryptocurrency centric markets.

Cryptocurrencies are beyond blockchain, for instance, IOTA uses “tangle”, which is based on a mathematical concept on directed acyclic graph to overcome limitations with the blockchain architecture. Regardless, the underlying strategy is to have a decentralized data platform and infrastructure conforming to the notion of trust and validity. IPDB, a planetary-scale blockchain database gets us closer to the big data characteristics of storage engines. BigchainDB being one such engine capable of holding the actual data and connect databases with interledger protocols for interoperability.

So, back to my topic of what to expect this coming year. Here are a few that I consider being upcoming enterprise level trends excluding developer tools.

Sharing data – multinational enterprise or making trusted data public: For an industry consortium, each company owns particular node or nodes to manage competition. Share single source of truth public data on an IPDB database. There are numerous benefits, including gaining new insights and increasing profitability from data monetization. Ocean protocol is one such project and with promising vibes.

Audit and Control – solving AML, KYC, and Double-spending. This will rely on the above and how real-time analytics will be applied to make decisions about pseudonymous data. This is an enabler and a measure to the adoption of blockchain in financial services firms. The maturity will occur from middle- and back-office internal processes. It is already begun in global financial services firms.

Intelligent Portfolio of Cryptocurrency – With a total market cap of $600B and 1400 coins/cryptocurrencies being traded, there is a need to understand the trends and influencers in this industry. I saw few products come up but are limited to few coins/tokens. This has been understood in depth and in a language, an average investor can understand. The prominent financial services firms de-risking out of cryptocurrency investment is creating a large population to use gray market or illicit means to participate in this economy. I would add AI driven DAO’s to this category as well.

Expansion of Use Cases – We saw a variety of use cases for blockchain and I expect this to grow even further but now with a more regulated and informed approach that conforms to all of the above. Hoping to see traction in areas of public security and safety, and monitoring gray market.

]]>
https://www.immix-solutions.com/?feed=rss2&p=1608 0
IT Transformation Strategy https://www.immix-solutions.com/?p=1604&utm_source=rss&utm_medium=rss&utm_campaign=it-transformation-strategy https://www.immix-solutions.com/?p=1604#respond Sun, 05 Jan 2020 16:08:41 +0000 https://www.immix-solutions.com/?p=1604

IT transformation is inevitable, and the technology refresh cycle is becoming more and more aggressive and competitive. Open source has not only gained trusts of public sector enterprises but also into more regulated businesses and organizations. CIO’s office is constantly pushing for more innovative ideas, cost savings, and auditing their existing systems. Their guiding principles focus on evaluating open source solutions, cloud first approach, avoiding vendor lock-in, being agile, scalable, and all-encompassing security. The Big Data ecosystem has been following these principles very closely. The contributors to this ecosystem have seen the transformation as a necessity to solve for their specific needs and helping others in similar situations.

Change is driven by the consolidation of IT assets and legacy investments that have occurred in the past. The value proposition is even more attractive when organizations have control over licenses, users, and application teams using the right and preferred technology.

When should an organization realize a need for IT transformation?

The answer is, it should be an ongoing process where evaluating technologies, existing systems, and infrastructure must occur constantly. However, it should occur in alignment with the overall Business priorities. The priorities not just being “save me money” but also that maps to where the business wants to grow, customer experience, and new market discovery.

Here is an example from my experience with an insurance firm. Being a highly regulated business, the leverage of introducing new products and changing existing products was always a topic of scrutiny. This was not just from the customers but also from each of the state governments and other regulatory institutions. For us to succeed and maintain our current business, it was necessary first to understand what kind of data assets we had, especially being a company that was in business for many decades. During our discovery phase, it was important to map each of the components to a business process and identify what we called as the Business Improvement Opportunities (BIO’s). The result was a scalable business data lake catering to various lines of businesses, and with an ability to help consolidate existing commercial systems and software technologies.

The strategy we followed for multiple customers has now matured into a solution accelerator, helping us jump start similar engagements with our customers. Categorizing each of the discovery phases into an Enterprise Architecture framework, as shown in the picture. It is a top-down model that is inclined towards a “Business Domain Driven Architecture” model. Each of the BIO’s map to its relevant data attributes. These can be in the form of some complicated data architecture models like Data Vault or a more traditional data warehouse based on Kimball, or Inmon. The application layer enables presentation, processing, and maintenance of these data structures based on the business needs and priorities. Technology layer is what hosts these systems; that is the data center. This is where encryption and security setup occurs. Skilled technicians spend time on performance and monitoring options to optimize applications and recommend how data structures can or should be designed to gain better business results.

For each of these architecture layers, a multi-phase approach is recommended. Phase one is to conduct an early assessment of existing systems and IT assets. Gather vision statement from the business stakeholders, list out current pain points, challenges and future enhancements via workshops and user interviews. Phase two is to combine these discoveries into workable units of work. Areas that require an evaluation of technologies translate into proof-of-concepts, a list of “nice-to-haves” go into a feature list, and business use cases become user stories. Phase three combines everything together, takes these results and produces recommendations and an implementation plan that is backed by facts and results.

Of course, each customer engagement for us has been very customized and different. The use cases have varied from a Big Data Hadoop data lake to a zero-loss micro services platform. In each of these situations, the success was realized upon results and proof-of-concepts, showcasing the value behind such transformations.

]]>
https://www.immix-solutions.com/?feed=rss2&p=1604 0
Elastic {ON} & Replacing Google Search Appliance https://www.immix-solutions.com/?p=1601&utm_source=rss&utm_medium=rss&utm_campaign=elastic-on-replacing-google-search-appliance https://www.immix-solutions.com/?p=1601#respond Sun, 05 Jan 2020 16:05:33 +0000 https://www.immix-solutions.com/?p=1601 Elastic {ON} has always been a great place to witness the sheer excitement shared by the developers and their management teams with Enterprise Search technologies. As a leading partner and sponsor of such events, we get a first-hand look at the adaptability of this technology and how it fits well with a diverse set of use cases.

One such use case is to identify an alternative to the Google Search Appliance (GSA). Over the past few months, Google has discontinued the sale of this product with an end-of-life support till 2018. While an appliance going out of support or obsolete is not very uncommon, it does put pressure on the teams supporting or managing such platforms.

There are close to a dozen such options in the market with one being a possible cloud-based solution offered by Google to its existing GSA customers. Most of the others being another appliance or a subscription based software.

With that said:

How do you decide what’s best?

What is the criteria of evaluation?

Did the existing appliance fit well in the original TCO plan?

and most importantly,

Will you be in a similar situation repeating the whole process again in next few years?

Let us take a closer look at this use case.

If it were a cloud replacement, wouldn’t you rather start monitoring your GSA usage to see if a cloud-based solution is acceptable? And while at it, wouldn’t you also start investigating on what the gaps were in the current environment and what are the areas of improvement. Of course, all of this will happen on top of how the new solution maps to existing features.

The most efficient approach must be tailored specifically to an Enterprise Search domain mapped to your organization. Start with putting together an inventory of current data sources, content security requirements, legal, regulatory compliance, and identifying what is important to your end users. Each line item needs to convert into an evaluation use case that would map to success criteria’s and a broader product feature list. Prioritize the features applicable to you like: incremental directory and website crawl, auto spell check, relevancy boosting, support of languages, caching, and most recently expert search. Combine all these requirements and map them to a broader set of proof-of-concept (PoC) use cases.

While you start reaching out to the leaders in the Gartner’s Magic Quadrant, I propose you to specifically a look at Elasticsearch as one such option. My previous blog “Beyond Enterprise Search” gives an overview of few use cases. However, the main differentiator is its strong open source community of committers and a huge acceptance by most of the top enterprises across the world.

I do agree that comparing a one-size-fits-all black box to a continuously improving open-source based platform is challenging, but when done in the right way is as much rewarding as well.

In addition to the GSA capabilities I mentioned earlier, the elastic platform can be further extended to: scale-out architecture, Big Data Integration, Role-Based Access Control, Encryption, LDAP/AD Integration, ACL Authorization, Auditing, Machine Learning (contextually relevant search results), Monitoring and Alert, Recommender Engine, and Advanced Analytics (time series, graph relationships, and more). As with most of the open source technologies, do account for customization as part of your overall migration plan. Also, include few notable third-party partner solutions that are proven migration accelerators.

From a TCO perspective, the technology is very reasonable when bought under a support license. Third-party plugins and API’s practically would pay for themselves within the first year when compared to building these API’s from scratch. On the question of when does this become obsolete, the answer is that being part of an open-source software project, there will be continued improvements constantly added with no vendor lock-in at your end. When the time does come to move to a more futuristic technology alternative, such a platform will not become a critical cost and give enough time for your teams to evaluate newer solutions.

]]>
https://www.immix-solutions.com/?feed=rss2&p=1601 0
Design Patterns https://www.immix-solutions.com/?p=1592&utm_source=rss&utm_medium=rss&utm_campaign=design-patterns https://www.immix-solutions.com/?p=1592#respond Sun, 05 Jan 2020 15:57:39 +0000 https://www.immix-solutions.com/?p=1592

Big Data ecosystem is a never ending list of open source and proprietary solutions, and in my view, nearly all of them share common roots and fundamentals of good old platforms that we grew up with. With that as the basis, our topic for today is about architecture and design patterns in the Big Data world. Also, this will be my first blog in a series “Driven By Big Data” covering some more complex and exciting topics I see with various clients.

Let’s get started:

Most of us IT professionals, now or in the past have seen at least one big data problem and most of us have tried to solve for it. In my experience, there are plenty of reasons why those projects ended up as our happy hour conversations for years.

So how did we solve those problems?

It may have taken us a few heuristic techniques, prototypes, and ultimately solving for a repeatable pattern our scripts could handle. The same patterns lead to robust architecture frameworks; Lambda and Kappa being the two most prominent ones. Both strive to achieve a fault-tolerant, balanced latency, and optimum throughput system that conforms to the V’s of Big Data.

Lambda architecture consists of three layers: batch computation, speed (or real-time) processing, and a serving layer for responding to queries. It takes advantage of both batch and stream processors working alongside each other to mitigate any lost data sets or latency. However, the complexity arises when we try to maintain two different systems that are supposed to generate same results. This led to a more streamlined approach, Kappa. Here, every data stream feeds through a streaming layer with an ability to re-play a transaction into multiple serving data sets. I include a handy table with few recommended projects. I am sure there are more so feel free to reply and share.

That was easy, wasn’t it?

Let us look at a production scenario where we may have unpredictable data pattern and a few out-of-sequence event streams caused due to skew, server latency or failure. How do we check what architecture we should go with? Will it be Kappa for its simplicity or Lambda?

The answer is to look at the “life-of-a-event” carefully. This includes the size of each message, type of the event producer, capacity of a consumer system, and how the user will receive this information. One of the options we could take is to cache each transaction in the stream event store and replay the event in case of any discrepancies (Kappa) or store messages in a NoSQL data store by batch, perform an external lookup periodically in the stream to check for fallen messages (Lambda). A design choice like this can have a caveat, though, sacrificing latency for a “no-loss” system. My advice is to decide not just from what technology can do, but also on what the business requires. If I were a Financial institution, I would never want a missed transaction as opposed to a click stream that requires a lower latent solution.

]]>
https://www.immix-solutions.com/?feed=rss2&p=1592 0