Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating policy event logs with metadata #296

Open
wants to merge 4 commits into
base: event-log-enhancement
Choose a base branch
from

Conversation

emilyhuaa
Copy link

Issue #, if available:
When users enable policy event logs, it only contains IP and Port information for Source and Destination, and users have to look up name and namespace for the pod of the given IP address. This creates inefficiency for debugging network policy issues.

Description of changes:

This PR improves the visibility by supporting the Network Policy Agent to fetch pod Kubernetes metadata through a gRPC call with a new deployment on the customer's cluster. This deployment maintains a local cache of pod and service IP address and name/namespace info from the Kubernetes API server, syncing every 10 seconds.

Manual Testing
I did some testing of each of the functions and now the policy event logs look like this:
{"level":"info","ts":"2024-07-23T23:03:50.718Z","logger":"ebpf-client","caller":"utils/utils.go:106","msg":"Flow Info: ","Src IP":"192.168.42.144","Src Name":"client-5467bc68f4-zcl6g","Src Namespace":"client-5467bc68f4-zcl6g","Src Port":34765,"Dest IP":"192.168.77.216","Dest Name":"coredns-787cb67946-blpcn","Dest Namespace":"kube-system","Dest Port":53,"Proto":"UDP","Verdict":"DENY"}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@emilyhuaa emilyhuaa requested a review from a team as a code owner August 1, 2024 17:58
@emilyhuaa emilyhuaa changed the base branch from main to event-log-enhancement August 2, 2024 17:00
Dockerfile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
pkg/rpc/rpc_handler.go Outdated Show resolved Hide resolved
pkg/utils/utils.go Outdated Show resolved Hide resolved
func logPolicyEventsWithK8sMetadata(log logr.Logger, message *string, nodeName, srcIP string, srcPort int, destIP string, destPort int, protocol, verdict string) {
srcName, srcNS, err := utils.GetPodMetadata(srcIP)
if err != nil {
log.Info("Failed to get source name and namespace metadata", "source IP:", srcIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do away with this. We're already making an assumption that an unknown IP is an external IP to the cluster...

}
destName, destNS, err := utils.GetPodMetadata(destIP)
if err != nil {
log.Info("Failed to get destination name and namespace metadata for", "dest IP:", destIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Let's directly call utils.LogFlowInfo instead from capturePolicyEvents

rpc.RegisterNPBackendServer(grpcServer, &server{policyReconciler: policyReconciler, log: rpcLog})

// Connect to metadata cache service
serviceIP, err := utils.GetServiceIP(clientset, "metadata-cache-service", "default", rpcLog)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're making an assumption that aws-k8s-metadata service is available and we exit if we fail to find such a service on the cluster. We can't exit out of the process. We should make this optional (i.e.,) we print an enhanced log message if we find the service and if not we print the 5 tuple info of the flow (SrcIP/SrcPort/DstIP/DstPort/Protocol) like we do today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata-cache-service --> let's make this a const and maybe add an aws prefix to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we're creating a K8S client obj just to find the service IP of metadata cache service and there is no need to do it. We can just resolve the service name (DNS resolution) and that should provide us with service IP.

"Proto", protocol, "Verdict", verdict)

message = "Node: " + nodeName + ";" + "SIP: " + utils.ConvByteArrayToIP(rb.SourceIP) + ";" + "SPORT: " + strconv.Itoa(int(rb.SourcePort)) + ";" + "DIP: " + utils.ConvByteArrayToIP(rb.DestIP) + ";" + "DPORT: " + strconv.Itoa(int(rb.DestPort)) + ";" + "PROTOCOL: " + protocol + ";" + "PolicyVerdict: " + verdict
logPolicyEventsWithK8sMetadata(log, &message, nodeName, srcIP, srcPort, destIP, destPort, protocol, verdict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call this only if we find the aws-k8s-metadata service running on the cluster. If not, we print the log in the current format and move on. If not, we will see empty values for metadata fields in the log messages and that will look odd..

@@ -83,6 +85,18 @@ func main() {
os.Exit(1)
}

k8sconfig, err := rest.InClusterConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need for this. I see that we rely on this for deriving the service IP of metadata cache service and instead we can just resolve the metadata service name to get the service IP..


func LogFlowInfo(log logr.Logger, message *string, nodeName, srcIP, srcN, srcNS string, srcPort int, destIP, destN, destNS string, destPort int, protocol, verdict string) {
switch {
case srcN == "" && srcNS == "": // if no metadata for source IP
Copy link
Contributor

@achevuru achevuru Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might've to rearrange this flow. Let's say destN and destNS are empty as well, we will end up printing "" strings for DN and DNS. Same applies to rest of the combinations below...

Also, why are we using SHOSTIP for Source IP field here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a host networking pod such as aws-node pods or kube-proxy pods, we agreed to just say state that the IP is a host IP with no name or namespace.

go.mod Show resolved Hide resolved
log.Info("Flow Info: ", "Src IP", srcIP, "Src Name", srcName, "Src Namespace", srcName, "Src Port", srcPort,
"Dest Ext Srv IP", destIP, "Dest Port", destPort, "Proto", protocol, "Verdict", verdict)
*message = "Node: " + nodeName + ";" + "SIP: " + srcIP + ";" + "SN" + srcName + ";" + "SNS" + srcName + ";" + "SPORT: " + strconv.Itoa(srcPort) + ";" +
"DEXTSRVIP: " + destIP + ";" + "DPORT: " + strconv.Itoa(destPort) + ";" + "PROTOCOL: " + protocol + ";" + "PolicyVerdict: " + verdict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - "DEXTSRVIP" -> "DESTSRVIP"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants