Sunday, October 30, 2022

Create a temp URL valid for one minute only for file on Azure Blob Storage

How to allow dynamic URL in Azure using c# ?

        CloudStorageAccount account = CloudStorageAccount.Parse("yourStringConnection");
        CloudBlobClient serviceClient = account.CreateCloudBlobClient();

        var container = serviceClient.GetContainerReference("yourContainerName");
        container
            .CreateIfNotExistsAsync()
            .Wait();

        CloudBlockBlob blob = container.GetBlockBlobReference("test/helloworld.txt");
        //blob.UploadTextAsync("Hello, World!").Wait();

        SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy();

        // define the expiration time
        policy.SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(1);

        // define the permission
        policy.Permissions = SharedAccessBlobPermissions.Read;

        // create signature
        string signature = blob.GetSharedAccessSignature(policy);

        // get full temporary uri
        Console.WriteLine(blob.Uri + signature);

From .NET 5, Entity Framework Core provides a method that is available to retrieve the SQL statement from a Linq query without executing it, which can be done by the ToQueryString() method of IQueryable

Sunday, October 23, 2022

DB2 Replication Options

Replication Options

1) Db2 High availability disaster recovery (HADR):
Active/Passive Replication, supports up to three remote standby servers.
when active DB Down, With HADR, a standby database can take over in seconds. the original primary database can be brought back up and returned it to its primary database status, which is known as failback. A failback can be initiated when the old primary database is consistent with the new primary database. After reintegrating the old primary database into the HADR setup as a standby database, the database roles are switched to enable the original primary database as the primary database.

2) Db2 pureScale: designed for continuous availability, All software components are installed and configured from a single host. pureScale scaling your database solution using Multiple database servers, which are known as members, process incoming database requests; these members operate in a clustered system and share data. You can transparently add more members to scale out to meet even the most demanding business needs. There are no application changes to make, no data to redistribute, and no performance tuning to do.

3) IBM Info-Sphere Data Replication product (IIDR):

IIDR has three alternative components

Change data capture (CDC): for heterogeneous databases, ie, replication between Oracle and DB2.
SQL Replication: old way, used in broadcast topology, create staging tables in source DB which cost increase DB size to capture all DB changes.
Q Replication: use IBM MQ, capture all DB changes inside MQ, high volume, low latency.

Q Replication: the best solution in IIDR

Q Replication is a high-volume, low-latency replication solution that uses WebSphere MQ message queues to transmit transactions between source and target databases

Q Replication High availability scenarios

Two-nodes for failover: Update workloads execute on a primary node, Second node not available for any workload
Two-nodes with one read-only node for query offloading: Update workloads execute on a primary node, Read-only workloads are allowed on a second node
Two-nodes, Active/Active, with strict conflict rules: Update workloads execute on two different nodes, Conflicts are managed, Deployed only when conflicts can be carefully managed.
Three-nodes with at least one read-only node: Update workloads execute on a primary node, Read-only workloads execute on second and third nodes, Conflicts are tightly managed
Three-nodes, Active/Active, with strict conflict rules: Update workloads execute on three different nodes, Conflicts are managed, using data partitioning, workload distribution, use when have unstable/slow connection topologies.

Q Replication components

1) The Q Capture and Q Apply programs and their associated DB2 control tables (listed as Capture, Apply, and Contr in the diagram)

2) The Administration tools that include the Replication Center (db2rc) and the ASNCLP command-line interface

3) The Data Replication Dashboard and the ASNMON utility that deliver a live monitoring web tool and an alert monitor respectively

4) Additional utilities like the ASNTDIFF table compare program and the asnqmfmt program to browse Q Replication messages from a queue WebSphere MQ

Notes:

- The Q Capture program is log-based
- The Q Apply program applies in parallel multiple transactions to the target DB2
- The Q Capture program reads the DB2 recovery log for changes to a source table defined to replication. The program then sends transactions as WebSphere MQ messages over queues, where they are read and applied to target tables by the Q Apply program.
- Asynchronous delivery: Q Apply program receive transactions without having to connect to the source database or subsystem. Both the Q Capture and Q Apply programs operate independently of each other—neither one requires the other to be operating.

InfoSphere Information Server

InfoSphere Information Server is an IBM data integration platform that provides a comprehensive set of tools and capabilities for managing and integrating data across various sources and systems. It is designed to help organizations address data quality, data integration, data transformation, and data governance challenges.

InfoSphere Information Server enables businesses to access, transform, and deliver trusted and timely data for a wide range of data integration use cases, such as data warehousing, data migration, data synchronization, and data consolidation. It offers a unified and scalable platform that supports both batch processing and real-time data integration.

Key components of InfoSphere Information Server include:

1) DataStage: A powerful ETL (Extract, Transform, Load) tool that allows users to design, develop, and execute data integration jobs. It provides a graphical interface for building data integration workflows and supports a wide range of data sources and targets.

2) QualityStage: A data quality tool that helps identify and resolve data quality issues by profiling, cleansing, standardizing, and matching data. It incorporates various data quality techniques and algorithms to improve the accuracy and consistency of data.

3) Information Governance Catalog: A metadata management tool that enables users to capture, store, and manage metadata about data assets, including data sources, data definitions, data lineage, and data ownership. It helps organizations establish data governance practices and provides a centralized repository for managing and searching metadata.

4) Data Click: A self-service data preparation tool that allows business users to discover, explore, and transform data without the need for extensive technical skills. It provides an intuitive and user-friendly interface for data profiling, data cleansing, and data enrichment.

5) Information Analyzer: A data profiling and analysis tool that helps assess the quality, structure, and content of data. It allows users to discover data anomalies, identify data relationships, and generate data quality reports.

InfoSphere Information Server provides a comprehensive and integrated platform for managing the entire data integration lifecycle, from data discovery and profiling to data quality management and data delivery. It helps organizations improve data consistency, data accuracy, and data governance, leading to better decision-making and increased operational efficiency.

for more information visit
https://www.youtube.com/watch?v=U_PN8QLTec8

Tuesday, October 11, 2022

Big O notation

Big O notation is used to classify algorithms according to how their run time or memory space requirements grow as the input size grows.

From chart, O(1) has the least complexity, and O(n!) is the most complex.

Time Complexity, An algorithm is said to be

1) Constant time (also written as ${\textstyle O(1)}$ time) if the value of ${\textstyle T(n)}$ is bounded by a value that does not depend on the size of the input. For example, accessing any single element in an array takes constant time as only one operation has to be performed to locate it. In a similar manner, finding the minimal value in an array sorted in ascending order; it is the first element. However, finding the minimal value in an unordered array is not a constant time operation as scanning over each element in the array is needed in order to determine the minimal value. Hence it is a linear time operation, taking ${\textstyle O(n)}$ time.

2) logarithmic time when $T(n)=O(\log n)$ commonly found on binary trees or binary search.
An example of logarithmic time is given by dictionary search. Consider a dictionary $D$ which contains $n$ entries, sorted by alphabetical order.

3) Linear algorithm – O(n) – Linear Search.

4) Superlinear algorithm – O(n log n) – Heap Sort, Merge Sort.

5) Polynomial algorithm – O(n^c) – Selection Sort, Insertion Sort, Bucket Sort.

Space Complexity, measure the memory usage amount

1) Ideal algorithm - O(1) - Linear Search, Binary Search, Selection Sort, Insertion Sort, Heap Sort.

2) Logarithmic algorithm - O(log n) - Top down merge sort for linked list .

3) Linear algorithm - O(n) - Quick Sort, Merge Sort with recursive merge.

4) Sub-linear algorithm - O(n+k) - Radix Sort.

Merge Sort can use consume O(log(n)), O(n) or O(1) stack space!!,
A top down merge sort for linked list will consume O(log(n)) stack space,
and it's slower than a bottom up approach due to scanning of lists to split them. merge sort can take O(n) stack space due to the recursive merge().
A bottom up merge sort for linked list uses a small (25 to 32) fixed size array of references (or pointers) to nodes, which would meet the O(1) space requirement.

Link to wiki article:

https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

Development Tips & Tricks

Topics